Process

How we ship verified software

A five-part series on the multi-expert review methodology behind Shellfinity. Each post leads with a specific finding the panel caught that automated tests missed.

Part 1 路 Available now
Two reviewers caught what no test could
A convergent panel finding: two of four independent reviewers flagged the same defect that automated tests had passed. The bug lived between two pieces of correct code. Independence is load-bearing.
Part 2 路 Available now
The five-minute check that prevents months of no-op work
The surface audit check. Every phase begins with a short script that verifies the design's assumptions against the codebase before any new work begins.
Part 3 路 Available now
What unit tests can't see and how to find it anyway
Stress-as-discovery. Some classes of defect are invisible at unit scale and fatal in production. The discipline that catches them.
Part 4 路 Available now
Three tiers of trust
The accounting we use to make our trusted assumptions explicit. A single number for "trusted code" hides the work that matters.
Part 5 路 Available now
How a documented process compresses a quarter of design 10x
Three arcs, one quarter, three compression ratios. The audit discipline removes work that would have shipped without it and added no value. The ratio is what falls out.

Vertical Use-Cases series

Each post shows the methodology applied end-to-end in a specific vertical. What an LLM surfaces, what the engine verifies, and where the boundary falls.

Part 1 路 Available now 路 Medical DDx
Where the LLM stops and the engine starts: a clinical case
An anonymized differential-diagnosis case. The LLM ranks plausible diagnoses by surface pattern; the engine verifies which ones the evidence actually licenses, and which it does not.

Case studies

Specific failure classes in production AI, and the architectural properties that address them. Adjacent to the methodology series; not part of the numbered sequence.

Case study 路 Verified medical AI
When the transcript invents what wasn't said
The hallucination class clinical AI cannot tolerate, drawn from the AP investigation into clinical use of OpenAI Whisper. The architectural property that closes the gap.