Practical AI Dev

Architecture

Designing RAG around a truth contract

What “grounded” means when three parties disagree: user, corpus, model.

Retrieval-augmented generation is marketed as grounding, but grounding is not a boolean flag. It is a relationship between the user’s question, the documents you indexed, and the completion policy you give the model. When any of those is implicit, you get confident nonsense—often indistinguishable from success on a quick read. That is the core of the truth vs. fluency tension in product reviews.

What goes into the contract

At minimum: which corpora are authoritative for which intents; how conflicts between chunks resolve (freshness, jurisdiction, owner); what the assistant does when retrieval returns nothing or too much; and whether speculation is allowed or must route to a human—ties directly to oversight design. If you cannot test a violation, it is not a requirement yet.

Indexing is part of the contract

Version your index and log retrieval fingerprints. Many “model regressions” are ingest bugs—see structured logging for how to record chunk IDs without leaking PII. During exploration, validate contracts against messy real documents, not only clean FAQs.

Downstream: evals and cost

Contracts should surface in offline suites as explicit scenarios—e.g., “no relevant chunk” must yield refusal, not invention. They also interact with routing: larger contexts and rerankers cost money; budgets should reflect negotiated truth boundaries, not accidental sprawl.

Chunking and precedence in practice

When two chunks disagree, someone must own precedence: effective date, jurisdiction, or document authority. Encode that logic next to ingest—not only in prompt text—or retrieval will surface the wrong chunk first and the model will sound authoritative anyway. This is where fluency risk spikes: the model reads smoothly even when the corpus is internally inconsistent.

From contract to customer-facing copy

Help and marketing should restate the same boundaries users see in the assistant—especially around medical, legal, or financial topics. Misaligned copy creates support load and erodes trust; align drafts with what you test in hardening and validate during pilots.

Reranking, MMR, and “too many chunks”

Retrieval returns a candidate set; rerankers and maximal marginal relevance pick what the model reads. Those choices are part of the contract: if reranking suppresses a disconfirming chunk, the assistant may sound confident while omitting nuance. Log rerank scores and final chunk IDs, and add regression tests for known multi-chunk questions where order matters.

ACLs on documents, not only on APIs

Enterprise search must respect per-user permissions. The contract should state what happens when a user asks about a document they cannot access: refuse, partial answer, or route to an admin—never a silent omission that looks like ignorance. That behavior intersects policy and audit logs for access denials.

Lifecycle of a corpus change

New PDFs, wiki edits, and ticket exports continuously change facts. Define SLAs for ingest latency, backfill strategy, and user-visible “as of” timestamps. When marketing announces a policy update before the index catches up, you need a communications bridge to fluency—otherwise the model “sounds current” while citing stale chunks.

A truth contract is not a PDF—it is the set of behaviors you can test, log, and explain when something goes wrong.

Related notes