FormalRV
Formally-verified resource estimation for fault-tolerant Shor's algorithm — from the logical algorithm down to the physical surface-code device.
📖 API docs: https://yezhuoyang.github.io/FormalRV/
▶ New here? Start with FormalRV.StandardShor — the standard,
textbook Shor + surface-code lattice surgery, curated as a four-step learning path
(guide). Read it before the advanced low-overhead papers.
Goal
Put Shor's algorithm, its arithmetic circuits, the logical (Pauli-product) layer, and the quantum-error-correction stack into Lean 4; prove what can be proven, emit the verified circuits as runnable code, and benchmark seven state-of-the-art resource estimates against machine-checked bounds — making the residue that cannot be proven explicit.
Achievements
- Axiom-free success bound.
Shor_correct_var/Shor_correct_verified_no_modmult_axioms(start atShor/Main.lean): order-finding succeeds with probability≥ κ/(log₂N)⁴,κ = 4·e⁻²/π². Its#print axiomsis only Lean's three standard axioms — no project axioms, nosorry— instantiated with a constructive modular multiplier (no oracle). - Proof → runnable code. The same circuits FormalRV proves correct are emitted as OpenQASM
2/3 and Stim, and an independent tool (Qiskit, Stim) re-verifies the artifact without trusting
Lean — emitted gate counts match the proved counts; Stim
has_flowre-checks each surgery. The verified schedule also compiles totqec-validated 3D surface-code lattice-surgery layouts (.glb/ ray-traced) viaPyCircuits/ls_compile.py. - Device scheduling + hard resource bounds. A unified FT-scheduling framework
(
System/FTFramework): the full ~10⁹-op RSA-2048 schedule defined recursively and proven valid for all sizes; a kernel-clean lower boundQ·T ≥ K·fq·prod(≈ 22.4M qubit-hours for RSA-2048) that no schedule can beat; a hardware sensitivity analysis monotone in every device parameter; and a device-program emitter unifying physical operations + system calls, checked against four system invariants. - Seven-paper corpus. Each estimate is bound to one typed tuple, with a machine-checked bound where the framework allows (table below).
The four-layer stack
L1 Algorithm Shor + QPE + modular exponentiation + Ekerå–Håstad ┐
L2 Logical gadgets adder · controlled adder · unary lookup · QFT │ error
L3 PPM / logical Pauli-product measurement · magic-state cultivation │ bounds
L4 QEC code parity-check matrices · stabilizer schedule · surgery ┘ propagate up
Each layer is a Lean structure with an explicit inter-layer contract; three error mechanisms (logical/random, approximation, algorithmic-uncertainty) propagate bounds up to the success theorem.
Worked example — the 2-bit adder, end to end
The smallest non-trivial slice of the whole stack: the verified 2-bit Cuccaro adder
cuccaro_n_bit_adder_full 2 0 (proven by cuccaro_n_bit_adder_full_correct: target := a+b, read
restored), pushed through every layer onto a zoned hardware architecture, ending in a
re-runnable, machine-checked resource verdict. Full detail — the complete OpenQASM, the full
PPM program, the 192-SysCall schedule, and a self-verifying parameterized Lean file — lives in
Example/. Run it (and re-run it for your own hardware) with:
lake env lean --run Example/Adder2EndToEnd.lean # type-checks `schedule_fits` + prints the table below
Architecture — 4 zones × 100 logical-patch sites: Data[0,100) · Ancilla[100,200) ·
Factory[200,300) (|C̄CZ̄⟩ magic) · Routing[300,400); Gate2q‖=1, decoder ‖=4, t_react=10µs
— all editable in the EDIT HERE block (change them, re-run, get a new verified verdict).
Verified resource on that architecture — every row machine-checked, nothing taken on trust:
| Layer | Resource | Value | Verified by |
|---|---|---|---|
| L2 logical | qubits / Toffoli / T-count | 5 / 4 / 28 | cuccaro_n_bit_adder_full_correct (+ Qiskit count re-check) |
| L3 PPM | |C̄CZ̄⟩ magic / joint measurements | 4 / 12 | compileArithmeticGateToPPM |
| System | SysCalls / wall-clock | 192 / 192 µs | scheduleWallclockUs; fits the architecture via schedule_fits |
| System | wall-clock lower bound | ≥ 72 µs | gate2q_capacity_lower_bound_us (⌈72 Gate2q / 1‖⌉·1µs) |
| L4 surgery | conflict-free layout / volume | ✓ / 60 | ls_compile certificate |
If you tighten the hardware past feasibility, schedule_fits is rejected — that is the verdict.

…and the FULL adder, end to end, on real neutral-atom hardware
The same 2-bit adder runs as a complete d=3 surface-code lattice-surgery computation on a
neutral-atom machine (Example/neutral_atom/) — every layer present, no
placeholders:
- It really computes
a+b— the measurement-based realization (real|C̄CZ̄⟩magic by gate teleportation, measurement-driven Pauli feed-forward) is simulated and matches the adder on all 32 inputs × 30 random measurement branches (logical_adder/verify_mb_adder.py). - d=3 lattice surgery — each of the 5 logical qubits is a
[[13,1,3]]patch; every gate is the full merged-code syndrome (verifiedsurface3_zz_merge88 CX /surface3_zzz_merge131 CX), each Toffoli a real|C̄CZ̄⟩injection. - Detailed system schedule — 192 SysCalls, machine-checked to fit the architecture
(
schedule_fits), wall-clock 192 µs. - Neutral-atom compile (ZAC, HPCA 2025) — 107 atoms (Memory / Ancilla / Factory / Reservoir
zones), 1240
CZ, 95 Rydberg stages, ZAC-verified, invariants re-proven under neutral-atom capacities. The GIF shows the atoms physically moving to do it:

Repository layout
Each concern is a folder with its own README.md (purpose, key definitions, key theorems,
honest status):
| Folder | What it holds |
|---|---|
Core/ | Gate IR + classical/quantum (matrix) semantics; the 7-T Toffoli = CCX proof |
Arithmetic/ | adders, modular multiplier, unary lookup — with correctness proofs |
Shor/ | ★ the main theorem (MainAlgorithm/), QPE, phase kickback, IQFT; the reusable Shor→PPM/emit + windowed pipeline |
QEC/ | qLDPC parity-check matrices, code instances, and derivedK (k = n − rank Hₓ − rank H_z) |
PPM/ | Pauli-product measurement, Pauli algebra, magic factories |
LatticeSurgery/ | surgery merge/split + system-call contracts; the reusable surgery gadgets + surface-code Shor pipeline |
System/ | scheduling / device / resource-bound framework (FTFramework); the reusable cost / decoder / zone / latency models |
Framework/ | the four inter-layer contract interfaces (L1–L4) |
Audit/ | one folder per paper (uniform Hardware/Zones/L1–L4/Verifier) — paper-specific only; all general/reusable code lives in the framework folders above |
Qualtran/ | Qualtran PhysicalParameters data bridge |
Codegen/ | the verified QASM / device-program emitters |
Files are named for their content and kept small (topical modules behind a <Name>.lean umbrella).
Per-paper audit — claim vs. verified
Each paper has its own folder under FormalRV/Audit/ with a uniform
structure — Hardware · SystemZones · L1_Algorithm · L2_Arithmetic · L3_PPM · L4_Code ·
Verifier · README. All general/reusable code lives in the framework folders (LatticeSurgery,
Shor, System, QEC, PPM, Framework, …); a paper folder holds only that paper's specific
implementation + scheduling and imports only general code — never another paper. Rigor is
enforced on build: each folder's Verifier.lean runs
#verify_clean, the gate that ACCEPTS a theorem only if its transitive axioms ⊆
{propext, Classical.choice, Quot.sound} — so a sorry or native-tainted axiom makes the build
fail. Each layer is exactly one of ✅ verify-clean semantic · ➗ arithmetic-only (decide) ·
⬜ documented GAP (never a counted number masquerading as a proof). Verify one paper with, e.g.,
lake build FormalRV.Audit.Gidney2025. The one cross-cutting ✅ result — order-finding success
≥ κ/(log₂N)⁴, N-parametric — lives in Audit/Peng2022 and is reused by every paper's L1.
| Paper folder | Headline claim | What is machine-checked (✅ semantic · ➗ arithmetic · ⬜ gap) |
|---|---|---|
CainXu2026 (focus) — 2603.28627 | RSA-2048 in ~10⁴ qubits, ~1 week | ✅ modexp PRESERVES the LP code (induction, scale-free) + LP-surgery gadget + lower≤upper soundness + verified resource upper bound + 10⁹-PPM schedule; ➗ k DERIVED from matrices, Eqs E3/E4/E9; ⬜ factory-sharing/parallelism gaps sized |
GidneyEkera2021 — 1905.09749 | 20M qubits, ~8 h | ✅ CAPSTONE axiom-free: 19.44M ≤ 20M, 8 h sits 2–3× under the verified time ceiling; ✅ finite-zone invariants (over-budget rejected) |
Gidney2025 — 2505.15917 | <1M qubits, <1 week | ✅ CFS residue-arithmetic engine axiom-clean (faithful RNS, exact CRT, bounded truncation, Ekerå–Håstad); ✅ tally 897,864 < 10⁶; ⬜ Assumption 1 stated-never-asserted; ⬜ quantum half |
Pinnacle (webster-2026) — 2602.11457 | RSA-2048 in <100k qubits | ➗ GB code k=12 DERIVED from matrices ([[72,12,6]]); ✅ RSA instance recorded; ⬜ measurement gadget / magic engine / <100k bound (roadmap, OPEN) |
Babbush2026 — 2603.28846 | ECC-256 in <500k qubits, 18–23 min | ✅ shared bound (confirms modulus-agnostic L1); ➗ verified magic-state spacetime floor; ⬜ first non-RSA, end-to-end OPEN |
Xu2024 — 2308.08648 | constant-overhead FTQC, 24 ms cycle | ➗ the 24,000× cycle-time outlier cross-check; ⬜ tuple (the arch the neutral-atom demo realizes) |
Peng2022 (SQIR/Coq) — 2204.07112 | machine-checked Shor | ✅ the cross-cutting bound lives here — order-finding success ≥ κ/(log₂N)⁴, N-parametric (axiom-clean) |
Legend: ✅ Verified semantic theorem · ➗ Arithmetic-only (decide) · ⬜ Recorded/Assumed.
What is proven vs. assumed
Strict honesty taxonomy — only semantic-correctness theorems count as Verified (a gate count on an unverified circuit is just counting symbols):
Machine-checked, no custom axioms: the Shor success-probability chain, the Cuccaro/Gidney
adders, the constant modular multiplier, the 7-T Toffoli identity, the QPE peak bound, the
Pauli/stabilizer algebra, and the schedule resource lower bound. Out of scope (assumed by
citation): decoder correctness & runtime, hardware physics, magic-state distillation internals,
and merged-code distance. See each folder's README.md for per-area status.
Build
git clone https://github.com/yezhuoyang/FormalRV && cd FormalRV
lake exe cache get # prebuilt mathlib (≈ minutes)
lake build # builds the whole library
Check a theorem's axioms with #print axioms FormalRV.Shor_correct_var
(expected: propext, Classical.choice, Quot.sound). Emit + independently re-verify the circuits:
lake env lean --run scripts/EmitQASM.lean (→ Qiskit re-counts gates) and
lake env lean --run emit_shor_demo.lean (→ Stim has_flow).
License
MIT © 2026 John ye. Built on mathlib;
the Shor layer ports SQIR; Qualtran/ bridges
Qualtran.