Three tiers of trust: how we account for what our system relies on
Asked how much of our system we trust, we used to give a single number: total lines of code the correctness argument depends on. The number was honest. It was also nearly useless.
The useful question is not how much we trust. The useful question is what kind of trust each piece carries. A line of code that produces a verdict is one kind of trust. A line of code that rejects a malformed input at a boundary is a different kind. A line of code that records a metric is a third. If a verdict line changes, the system can return a wrong answer that nothing else catches. If a metric line changes, the metric is wrong and correctness is unaffected. A single number cannot say which is which.
What follows is the tier discipline we use to keep that question answerable, and the log that keeps the answer durable.
Why one number hides risk
The trusted compute base of a system is the code its correctness argument leans on. The natural way to report it is a single number. The natural number is total lines.
The natural number is wrong. A few hundred lines of code that produce verdicts is a different kind of risk than a few hundred lines of code that decorate a response with metrics. Both might be inside the trusted compute base under a permissive definition. Only one of them can return a wrong answer to a user if it changes. Lumping the two together produces a TCB count that is honest about its arithmetic and silent about its risk.
The number invites the wrong management conversation. A budget meeting asks "can we get TCB under five hundred lines?" The answer that minimizes the number cuts the easiest pieces. The easiest pieces are usually the ones that did not matter for correctness. The number drops. The risk does not.
A single number tells you how much code you trust. It does not tell you what would break if that code changed.
A useful accounting answers a different question. It asks what would break, and how badly, if each piece of trusted code changed. Instead of a number, the accounting produces a partition.
The three tiers
The cleanest place to see the tiers is at the language boundary the system crosses. The same handful of lines that hand a verdict from one runtime into another sits next to a pointer-validity check sits next to a counter that records how many verdicts were produced. Three pieces of code, one location, three risk levels.
The first piece is correctness-trusted. If a marshaller transposes two fields, the verdict that arrives downstream is wrong and downstream code treats it as proof of correctness. Nothing else in the system rechecks the verdict. The marshaller is part of the trusted compute base in its strongest sense. Changes to it are the changes that need the strongest review.
The second piece is defensive. The pointer-validity check at the same boundary catches malformed inputs that the runtime above it should never have produced. If the check fails open, a malformed input crosses and may crash a downstream consumer. It does not silently corrupt a verdict. Correctness, in the soundness sense, is preserved if the check rejects malformed inputs and returns an error instead of letting them through.
The third piece is instrumentation. The counters and diagnostic fields attached to each response record what the boundary saw: how many verdicts were produced, how many were rejected at the defensive layer, how long each took. If a counter wraps or an off-by-one slips in, an operator dashboard is wrong. The verdict that produced the count is still correct.
Three tiers, one boundary, three risk levels. A useful accounting reports them as three lines: a few hundred trusted lines, tens of defensive lines, around a hundred instrumentation lines. A budget conversation against that accounting can ask the right question. The right question is whether a piece can be moved down a tier and still hold the correctness argument. The wrong question, the one a single TCB number invites, is whether the total can be cut.
The decisions log as partner discipline
Tier accounting captures the surface. The decisions log captures the choices behind it.
Every entry in our decisions log carries two lines that look optional and are not. One line names what was chosen. The other names what was rejected and why. A year later, a maintainer reading only the first line sees a deliberate trade-off and finds the trade-off space empty; the choice looks arbitrary. The same maintainer reading both lines sees the same trade-off and the alternatives that were weighed against it; the choice looks like the choice it was.
A worked instance: a piece of code that lived in the correctness-trusted tier was a candidate for relegation to the defensive tier by adding a check that downstream code already implied. The argument for relegation cut trusted lines. The argument against pointed out that the implied check, on close reading, did not exhaust the input space the upstream code actually produced. The log records the choice (keep in correctness-trusted) and the alternative (relegate plus add the defensive check). A maintainer next year reads both lines and sees that the relegation was considered and rejected for a reason. The reason is in the log.
The two disciplines work as a pair. Tier accounting tells a reader what is trusted. The decisions log tells the reader why each piece is in the tier it is in. Either one alone leaves the next maintainer guessing. Together they make the trust surface a thing a reviewer can read and a manager can budget against.
Why we publish this
For technical buyers. Ask any vendor about their trusted code in tier terms: how much is correctness-trusted, how much is defensive, how much is instrumentation, and what was rejected to put each piece in its tier. A vendor that cannot answer in tier terms is shipping an uninspected trust surface. The vendor's risk is yours, and you will inherit it without a map.
For people thinking about defensibility. The discipline compounds. Each decision-log entry rolls forward as a load-bearing record; each tier reassignment is a recorded choice. The trust surface stays interpretable over the life of the system. A vendor that ships this discipline ships a trust map; a vendor that ships a single TCB number ships an aggregate.
What's next
Part 5 closes the series. A retrospective on how the disciplines compose, with measurements on what the compounded practice produces over a quarter of work. It will arrive in this series.