Process

How we ship verified software

A five-part series on the multi-expert review methodology behind Shellfinity. Each post leads with a specific finding the panel caught that automated tests missed.

Part 1 · Available now

Two reviewers caught what no test could

A convergent panel finding: two of four independent reviewers flagged the same defect that automated tests had passed. The bug lived between two pieces of correct code. Independence is load-bearing.

Part 2 · Available now

The five-minute check that prevents months of no-op work

The surface audit check. Every phase begins with a short script that verifies the design's assumptions against the codebase before any new work begins.

Part 3 · Available now

What unit tests can't see and how to find it anyway

Stress-as-discovery. Some classes of defect are invisible at unit scale and fatal in production. The discipline that catches them.

Part 4 · Available now

Three tiers of trust

The accounting we use to make our trusted assumptions explicit. A single number for "trusted code" hides the work that matters.

Part 5 · Available now

How a documented process compresses a quarter of design 10x

Three arcs, one quarter, three compression ratios. The audit discipline removes work that would have shipped anyway as pure overhead. The ratio is what falls out.

Vertical Use-Cases series

Each post shows the methodology applied end-to-end in a specific vertical. What an LLM surfaces, what the engine verifies, and where the boundary falls.

Part 1 · Available now · Medical DDx

Where the LLM stops and the engine starts: a clinical case

An anonymized differential-diagnosis case. The LLM ranks plausible diagnoses by surface pattern; the engine verifies which ones the evidence actually licenses, and which ones it rules out.

Case studies

Specific failure classes in production AI, and the architectural properties that address them. Adjacent to the methodology series; not part of the numbered sequence.

Case study · Verified medical AI

When the transcript invents what wasn't said

The hallucination class clinical AI has to shut out, drawn from the AP investigation into clinical use of OpenAI Whisper. The architectural property that closes the gap.