Practical AI Dev

Field note

Rules as code, not as meetings

Making policy legible to engineering—and testable in CI.

Safety and legal stakeholders are not obstacles; they are owners of constraints that models will violate by default if left implicit. The failure mode is serial review: every prompt change waits for a committee, while production runs an older, unmaintained configuration. The alternative is to co-design truth contracts and automated checks together—same artifact, two audiences.

Testable constraints beat slide decks

When a rule can be expressed as “never claim X without source Y,” it belongs beside schema validators and tool guards, which we discuss in layered evals. Policy teams review the suite and sampling strategy, not every diff to adjectives in a system prompt.

Speed without silent drift

Product velocity recovers when engineers know which changes require re-approval. Pair that clarity with fluency discipline so marketing language does not promise guarantees the contract forbids.

For customer-facing pilots, parallel runs give policy teams empirical evidence—overrides, escalations, time-to-resolution—instead of hypothetical risk workshops.

Shared language across functions

Misunderstandings between legal and engineering often come from vocabulary, not intent. Publish a short glossary: what “grounding,” “hallucination,” and “human review” mean in your product—aligned with how you test in eval suites. That document becomes the bridge for fluency discussions when marketing drafts new copy.

Incident response as policy feedback

When production breaks a rule, capture whether the fix belongs in model behavior, retrieval scope, or UI disclosure. Feed those outcomes into the same backlog as feature work—otherwise policy becomes a rotating set of slide decks while the deployed system drifts. Tie lessons to ownership so runbooks and contracts stay in sync.

Regulatory snapshots vs. continuous deploys

Some industries require filing a “frozen” description of behavior. If you ship weekly prompt changes, reconcile your compliance story: either automate diff summaries for filings or maintain a labeled stable channel for regulated users. That decision intersects velocity vs. verification—you cannot promise both maximum agility and maximum regulatory stability without naming the trade-off.

Working sessions that actually work

Replace generic “AI safety” seminars with concrete scenarios: walk through five real transcripts, mark where the contract would forbid an answer, and assign owners for each gap. End each session with test cases added to CI—otherwise the meeting was theater.

Vendor and third-party models

When you depend on external APIs, policy must cover subprocessors, data retention, and acceptable use. Engineering should surface which fields leave your boundary in traces so legal review matches runtime—not an outdated architecture diagram.

Policy at the speed of meetings loses to policy at the speed of tests—if the tests are legible to non-engineers.

Related notes