Synthesis Engineering · Synthesis Coding · Synthesis Writing · Synthesis Project Management

Postel's Law for the human-AI interface

When the producer at one end of a software interface is a large language model, the engineering instinct that has worked for forty years — define a schema, enforce it on both sides, fail loud on drift — does not just stop helping. It actively hides live data from the user. The principle that replaces it is Postel’s Law, RFC 793, 1981: be conservative in what you do, be liberal in what you accept from others. The half of it that mattered least in 1981 is the half that is now load-bearing.

I will use a tool I built as the worked example. The tool is a dashboard that renders my daily plan, a markdown file written each morning by an agentic LLM-driven assistant. The plan has typed sections: decisions waiting on me, priority tasks, message drafts ready to send. The dashboard parses the markdown, recognizes each section, and turns it into the affordance you would actually want for that content type — a button you click for a decision, a checkbox for a task, a Copy/Send button for a draft. The assistant writes the file. The dashboard reads it. Two pieces of software I wrote, two ends of a wire, both at my desk.

I shipped the cockpit view of the daily plan as a new feature, started using it on my actual plan, and within hours I had filed three same-day bugs against my own work. The decisions section of the dashboard was empty even though my plan opened with a heading called “Open ask” and a paragraph asking for help with something. A morning ritual had created a “Drafts ready to send” section with two drafts in it, and mid-day updates had inserted three more drafts inline next to the situations that prompted them, under a different heading. The dashboard showed the first two and missed the others entirely. About 15% of the section headings in my historical daily plans fell through to a generic “Other” bucket, including section names I actually use day-to-day.

I had thought the parser was tolerant because it used substring matching. It was not tolerant. It was strict in a costume.

The fix was not more vocabulary terms, though there were those too. It was a recognition that one end of this contract is an LLM and the parser had to behave accordingly. This is Postel’s Law applied to the human-AI interface, and the way you operationalize it is what I will call the document-as-contract pattern.

Why strict parsing seemed reasonable

When you write both sides of a producer-consumer interface, the standard engineering instinct is to define the schema, enforce it on both ends, and fail loud when the producer drifts. That is how you ship correct software. JSON Schema, Protobuf, OpenAPI; pick your favorite. Drift produces a build error or a runtime exception that someone pages on, and the contract stays honest because the type system makes drift visible.

That instinct works when both ends are deterministic. The producer renders a struct to bytes; the consumer parses bytes back to a struct. If the schemas match, every field round-trips. If they don’t, the test suite tells you on the next CI run.

The instinct breaks when one end is an LLM.

The Postel reveal

Jon Postel, defining TCP in 1981, wrote what would later be quoted everywhere: be conservative in what you do, be liberal in what you accept from others. The phrasing is so familiar it almost sounds like a network-engineering proverb. It governed TCP, then generalized to anywhere two pieces of software talk over a wire.

In its original setting, “be liberal in what you accept” was a robustness courtesy. The expectation was that both ends were also being conservative. The spec said how to encode a TCP segment, both sides shipped code that followed the spec, and the liberal-acceptance rule was a margin of safety against the small drifts that real implementations exhibit. The producer was strict-by-construction; the consumer added a layer of forgiveness on top.

When the producer is an LLM, the producer is no longer strict-by-construction. LLMs do not render a struct. They generate text that satisfies a goal in context. Given a template that says “write a section called Decisions needed,” an LLM-driven assistant will sometimes produce that exact heading, sometimes “Decisions to make,” sometimes “Open ask” because the day actually has just one ask and that phrasing fits better. The model is not conspiring to break your parser. It is solving the user’s problem, with the template as guidance.

The “be liberal in what you accept” half of Postel’s Law was always there, but it was a courtesy. With an LLM on the other end of the wire, it gets promoted to load-bearing. The consumer’s tolerance is doing most of the work the strict-producer half used to do.

That changes the engineering. You cannot enforce strictness by failing on unknown input: the unknown input is the live data the user wrote. You cannot even insist on the canonical heading names. The LLM might invent a better one for today’s situation, and rejecting it would mean either rejecting the LLM’s judgment or hiding what the user is actually asking for. Both options ship a worse tool.

What you can do is define the contract twice, in a shape that lets the consumer accept whatever a reasonable LLM would write.

The document-as-contract pattern

Both repositories now reference the same vocabulary table from documentation rather than from a code-enforced schema. The producer (the assistant) has documentation that lists every recognized section type with its canonical name and its known synonyms: what to write, when in doubt. The consumer (the dashboard) has documentation that lists the same vocabulary, viewed from the other side: what the parser knows how to type. Both files cross-link to each other, and both carry a rule at the top: change both files in the same commit.

The producer treats the document as guidance. The assistant writes plans using the canonical names where it can, picks a recognized synonym where the canonical does not fit the day, and is told it can deviate when the situation calls for it. Nothing in the assistant enforces the contract. The contract is the document, not the code.

The consumer treats the same document as the canonical vocabulary it knows how to type. The parser recognizes every name in the vocabulary table, plus the synonyms, plus a few stripping rules for noise the LLM might add. Anything not in the table falls through to a graceful default: render the section as plain markdown inside a collapsible at the bottom of the page. Never lose the data; never reject the input.

The two-doc rule is the most important part. Schema drift is invisible until it produces a same-day bug. Documentation drift is the same hazard with different syntax. Treating the producer-side doc and the consumer-side doc as a single specification split across two repositories, and enforcing the change-together discipline at the commit level, is what keeps the contract honest over time.

Tolerant parsing in practice

The full classifier in the dashboard is about thirty lines of code:

function classifyH2(text: string): SectionKind {
  // Strip noise the LLM might add.
  const t = text
    .replace(/~~/g, "")
    .replace(/[🚨🔥🚀🟡🟢💡📞🤖🧪🆕✅⚠️⛔🎯📋🔧📌]/g, "")
    .trim()
    .toLowerCase();

  // Substring + case-insensitive. First match wins.
  if (/decisions?\s+(?:needed|to\s+make)|open\s+asks?\b|open\s+items?\b/i.test(t)) return "decisions";
  if (/^priority\s+tasks?\b|^today'?s?\s+(?:tasks?|priorit)|^still\s+to\s+do/i.test(t)) return "priority-tasks";
  if (/^drafts?\b|unsent\s*(?:[—-]\s*)?(?:ready|drafts?)|^pending\s+emails?/i.test(t)) return "drafts";
  // ... eight more section kinds, same shape ...

  // Anything else renders as plain markdown.
  return "other";
}

Four techniques are doing the work.

Strip noise. Strikethrough markers and emoji prefixes get removed before matching. So ## ~~CRITICAL: Old launch issue~~ and ## 🚨 Decisions needed both classify the same way as the bare names.

Substring, case-insensitive. The LLM might write “Decisions needed today” or “Decisions to make this week.” Anchoring on a substring keeps the parser from caring about the prefix or suffix the writer added.

Synonyms in the regex. Each section kind’s pattern lists the recognized variants, about twenty across all kinds, learned from auditing the real corpus. New synonyms get added to the pattern and listed in the documentation.

Graceful fallback. The default branch returns "other", which renders the section as plain markdown inside a collapsible at the bottom of the page. Unrecognized sections are not hidden. They are not specially treated, but they are visible. The user always sees the data they wrote.

Same shape, different axis

Hours after the vocabulary fix shipped, a different bug surfaced, one that had nothing to do with section names. A draft message body in my plan was wrapped in a triple-backtick fenced code block, which is the convention for “this is the message to send verbatim.” Inside the message body, the LLM had included four code-block examples — curl, npx, git clone, rm -rf install commands — each one in its own triple-backtick fence.

CommonMark closes a fenced code block at the first matching fence of the same length it finds. So instead of one outer fence wrapping four inner code-block examples, the parser saw alternating open/close pairs. The rendered HTML had the prose displayed as monospace and the install commands displayed as plain hyperlinked text, exactly inverted. Worse, the dashboard’s draft-decoration logic only attached the action bar (Copy, Edit, Send) to the first parsed fenced block. The visible Copy button was attached to a fragment of the draft, with the rest of the body rendering below the action bar disconnected from the message it belonged to.

This is not a vocabulary problem. The producer’s heading was perfectly canonical. The bug lives on a structural axis: how the body of the draft is delimited.

The fix has the same two halves as before.

Producer side. When wrapping a body that contains internal fenced code blocks, the outer fence has to be longer than any inner fence. Four backticks outer, three inner. This is canonical CommonMark; the assistant’s draft-format documentation now recommends it explicitly.

Consumer side. Stop assuming the body is exactly one fenced block. Each draft begins with a **Send to:** <recipient> line and ends at the next heading or at a **Sent:** ... / **Grounding:** ... paragraph. Define the draft body as that whole region. Wrap the entire region in a single container element in the rendered HTML. Attach one action bar at the end. Single-fence drafts and blockquote drafts keep their existing edit-and-save semantics; multi-segment drafts (intro prose, then a fenced code sample, then more prose, then another fenced code sample) get one wrapper, one action bar, one Copy that emits the whole body region verbatim. Slack interprets the internal triple-backticks as code blocks on paste.

Same shape as the vocabulary fix: the consumer’s tolerance got extended, the producer’s canonical form got documented, both halves changed in one arc.

”But the parser is correct”

When the structural bug surfaced, there was a reasonable case for fixing only the producer. The parser is following CommonMark exactly: the first matching fence of the same length closes the block, which is the spec. So have the LLM-driven assistant always emit a longer outer fence, and the parser will correctly nest the inner fences. Done.

The argument is technically correct. The CommonMark parser is doing the right thing. If the producer always emits canonical CommonMark, the parser produces correct HTML.

The argument also misses where the bug lives.

The dashboard’s draft-decoration logic is not the markdown parser. It is an application-level layer that finds the first fenced block under each draft section and attaches the action bar to that block. That layer was strict-with-a-veneer in the same way the heading classifier was before the vocabulary fix. It assumed one shape (a single fenced block) and silently shipped a degraded output for any other shape (decorating only the first fragment, leaving the rest disconnected).

Fixing only the producer leaves that consumer layer untouched. The next variation that breaks the draft-decoration logic — a draft body that’s a blockquote instead of a fence, a body that has two fenced blocks for legitimate reasons, a body that uses indented code blocks — surfaces the same kind of same-day bug. The parser is still correct; the consumer is still fragile.

The two-half rule is not about who’s right per spec. It is about which layer is doing the application logic that turns parsed input into user-visible affordances. That layer has to be tolerant when its input comes from an LLM. The parser being correct is necessary but not sufficient.

Audit as the regression check

The thing that keeps a tolerant parser honest is a regression check that runs over the whole historical corpus, not a unit test on a synthetic example.

I have an audit script that walks every plan in the historical archive, classifies every heading, and reports the breakdown. The shape of the output looks like this:

Plans:   <a year-plus of accumulated files>
Headings: <thousands of total H2 headings>
Classified to typed kinds: 94.4%
Fell through to "other":   5.6%

Top "other" sections:
  Everything Else        (intentional catch-all)
  Background Agent Demo  (intentional catch-all)
  Reading Queue          (consider adding)
  Onboarding Notes       (consider adding)
  ...

That breakdown is the regression check. The “intentional catch-all” rows are sections I want in Other, generic backstops the dashboard does not need to render specially. The “consider adding” rows are sections the LLM has been writing for a while but the contract does not know about. Each one is a decision: extend the contract, or leave it as catch-all because it really is a long-tail case.

What this catches: if the LLM’s vocabulary drifts significantly between sessions or model versions, the fallback rate creeps up over time. The audit script makes that drift visible as a number rather than as a same-day bug.

What it does not catch: a section the LLM writes today that does not appear in the historical corpus. For that, the only check is using the dashboard on real plans and noticing when something is missing. There is no substitute for actually using the thing.

When the pattern doesn’t apply

The pattern is specific. It works when:

It does not work when:

The half that’s now load-bearing

A consumer that hides live data is worse than a consumer that fails. A strict parser that refuses to render an unknown heading, or attaches its action bar to a fragment of a multi-segment body, is not a defensive measure. It is a quiet way to lose the user’s data. The LLM did not conspire to break the contract. It solved a real problem the template did not anticipate. The right response is to render the unexpected content gracefully and update the contract for next time.

The producer-consumer contract has multiple axes. Vocabulary surfaced first; structure surfaced hours later. Markdown inside table cells, blockquotes inside fenced blocks, HTML embedded in markdown: those are plausibly the next ones in line. Each will surface the same way, as a same-day bug, and the response will be the same: extend the consumer’s tolerance, document the canonical form on the producer side, ship both halves in one commit.

Postel’s Law was always there. With an LLM on the other end of the wire, the “liberal in what you accept” half stops being a courtesy and starts being load-bearing. The shape that lets you ship is a tolerant default, an explicit document-contract that both sides reference, a same-commit rule that keeps the two halves of the document in sync, and an audit script that surfaces drift before it produces a bug.

That last bit is the discipline. The first three are how you stop hiding live data from the user.

Originally published on rajiv.com
synthesis codingdesign patternsllm integrationsoftware designparsinghuman-ai interfaceagentic systems