Principle 1
Every claim links to a quote in a moment in time
Published 2026-05-02 · ~4 minute read
The chain
Back-traceability is the structural chain from a claim in an output to the stage_run that produced it. The chain is:
claim → finding → quote → transcript line → participant → conducted-at → video timestamp → stage_run.
Every link is a foreign key on a row in Postgres. Every link is required. A claim with a missing link cannot be rendered — the tool refuses to ship it. The chain is not aspirational and it is not a feature added at the end of a project; it is the spine the schema is built around.
What goes wrong without it
Three failure modes recur in qualitative analysis tools that do not enforce the chain:
- Findings drift from their evidence. A finding edited in the report no longer matches the quotes it cites. A stakeholder reads the new wording and the original quotes and sees a mismatch nobody can explain.
- Quotes get reattributed. A speaker label gets edited; the quote stays the same; the participant the quote is attached to changes silently. The finding now has the wrong cohort signal.
- Numbers nobody can reconstruct.“Most participants” appears in the deck. The deck is six months old. Nobody can produce the count it came from. The claim is unverifiable, which means it is unusable in any rigorous follow-up.
Each of these is a recoverable mistake at the gate where the error happens, and an unrecoverable embarrassment three stages later. The chain prevents the recoverable mistake from propagating.
What it costs to keep
The cost is real and worth paying. Specifically:
- A foreign key on every model-derived row, pointing at the stage_run that produced it. The schema reads as more verbose than it would otherwise be; that verbosity is the audit trail.
- A versioned artefact at every stage gate. Each stage writes a row with the prompt template version, the model used, the token counts, the cost, and the raw output. Stages cannot stomp earlier versions.
- A “why did the tool say that?” expander on every finding card. The expander is not a developer-mode hidden panel; it is on the surface, by default, for the researcher using the tool.
- A discipline against editing quotes after they are pulled. Quote text is what was said, character offsets included. Cleanups go on a separate display field; the original text is immutable.
The cost is paid once in the schema design and once in the UI affordances. After that it is free — the chain maintains itself because the schema does not allow it to be broken.
What the auditor sees
The reader opens the finding card. The expander opens. The expander shows:
- The prompt sent to the model, verbatim.
- The model used (Sonnet, Opus, or Haiku) and the version.
- The token counts (input and output).
- The raw output the model returned.
- The stage version that ran.
- The timestamp the stage ran at.
- The supporting and counter quotes the stage cited.
- The transcript lines each quote spans.
- The participant ID and the conducted-at date.
- The video timestamp where the quote was spoken.
The trace stops only at “the model said this” — and at that boundary, the auditor has the prompt, the model, and the raw output. The judgement step is recorded with everything needed to scrutinise it.
Why this matters for AI-assisted analysis specifically
Large language models are prone to confident hallucination. A well-formed quote attributed to a participant who never said it looks identical, on the page, to a real one. The defence is not to stop using the models — they are too useful for first-pass coding and pattern detection — and it is not to add a disclaimer.
The defence is to refuse to render any claim whose evidence is not on disk. If the model produces a quote that is not in the transcript, the chain breaks; the claim cannot reach the deck. If the model produces a finding whose supporting quotes do not span the transcript lines they reference, the chain breaks; the claim cannot reach the report. The schema is the safety net.
An auditor scrutinising an AI-assisted finding gets the same chain a human-only finding would carry. That is the promise.
Read more
Why a methodology-first tool is different from a tag-and-quote tool — the trade-off between breadth and defensibility, named honestly.