AMO-Lean: Verified Optimizing Compiler for Cryptographic Primitives

Lean 4 License: MIT Tests Build

AMO-Lean is a verified optimizing compiler that transforms mathematical specifications written in Lean 4 into optimized C code with formal correctness guarantees. It targets cryptographic primitives used in STARK provers and zkVMs.

Release: v1.0.1

Production-Ready Components

  • NTT (Number Theoretic Transform) -- Verified, optimized, Plonky3-compatible
  • Radix-4 NTT -- Verified 4-point butterfly with formal proofs
  • Goldilocks Field -- Scalar + AVX2 arithmetic (p = 2^64 - 2^32 + 1)
  • FRI Protocol -- Folding, Merkle commitments, Fiat-Shamir
  • E-Graph Optimization -- 19/20 rewrite rules formally verified
  • Algebraic Semantics -- C-Lite++ lowering with verified scatter/gather
  • Poseidon2 Hash -- BN254 t=3 with HorizenLabs-compatible test vectors

How It Works

Mathematical Spec (Lean 4)
        |
        v
  E-Graph Optimization Engine
  (equality saturation optimization strategy)
        |
        v
  Algebraic Semantics (Sigma-SPL IR)
  (verified lowering for matrix operations)
        |
        v
  Optimized C / SIMD Code
  (correct by construction, Plonky3-compatible)

Formal Optimization Strategy

AMO-Lean uses equality saturation via e-graphs to find optimal formal equivalent forms of mathematical expressions. Unlike hand-tuned optimizers, every rewrite rule in our e-graph is a formally proven theorem in Lean 4. The process:

  1. Encode the mathematical specification as an e-graph
  2. Saturate by applying all verified rewrite rules until a fixed point
  3. Extract the optimal form using a cost model
  4. Lower through AlgebraicSemantics (Sigma-SPL IR) to C code

This architecture is portable and modular: adding a new primitive means writing a Lean specification and lowering rules -- the optimization engine and code generator are reusable across all primitives.

Two Operational Modes

ModePurposeStatus
VerifierCertify external code (e.g., Plonky3) is mathematically correctProduction Ready
GeneratorGenerate verified C code from Lean specsProduction Ready

Performance

NTT vs Plonky3 (v1.0.1 Benchmark -- Feb 2026, ARM64)

SizeAMO-LeanPlonky3Ratio
N=2564.8 us3.4 us1.39x
N=102423.3 us14.6 us1.59x
N=4096109.5 us65.9 us1.66x
N=655362.50 ms1.48 ms1.69x

Average: 1.65x slower than Plonky3 (60% throughput) with full formal verification. Aiming at 85% throughput in upcoming release, with full formal verification.

Full benchmark results (all sizes)
SizeAMO-Lean (us)Plonky3 (us)RatioAMO Throughput
N=2564.8 +/- 0.43.4 +/- 0.51.39x53.7 Melem/s
N=51210.6 +/- 0.57.1 +/- 1.01.49x48.5 Melem/s
N=102423.3 +/- 2.514.6 +/- 0.51.59x44.0 Melem/s
N=204850.5 +/- 1.731.0 +/- 1.01.63x40.6 Melem/s
N=4096109.5 +/- 0.965.9 +/- 2.21.66x37.4 Melem/s
N=8192255.1 +/- 13.7138.4 +/- 1.91.84x32.1 Melem/s
N=16384535.8 +/- 12.2294.3 +/- 9.41.82x30.6 Melem/s
N=327681176.1 +/- 92.0665.8 +/- 9.71.77x27.9 Melem/s
N=655362500.1 +/- 75.51477.4 +/- 29.21.69x26.2 Melem/s

Verified Compatibility

  • 64/64 oracle tests pass vs Plonky3 (bit-exact match)
  • 120/120 pathological vectors verified (sparse, geometric, boundary, max entropy)
  • 37/37 Goldilocks field tests pass with UBSan
  • FFI overhead: 0.03% (negligible)
  • Panic safety: Triple protection (validation + catch_unwind + panic=abort)

See BENCHMARKS.md for the complete benchmark report.

Quick Start

# Clone and build
git clone https://github.com/manuelpuebla/amo-lean.git
cd amo-lean
lake build

# Run NTT oracle tests (C, no dependencies)
clang -DFIELD_NATIVE -O2 -o test_ntt_oracle generated/test_ntt_oracle.c && ./test_ntt_oracle

# Run Plonky3 equivalence tests (requires Rust)
cd verification/plonky3/plonky3_shim && cargo build --release && cd ..
./oracle_test

Using libamolean (Header-Only C Library)

#include "amolean/amolean.h"

// Goldilocks field arithmetic
uint64_t a = goldilocks_mul(x, y);
uint64_t b = goldilocks_add(a, z);

// NTT with pre-computed context
NttContext* ctx = ntt_context_create(10);  // N = 2^10 = 1024
ntt_forward_ctx(ctx, data);
ntt_inverse_ctx(ctx, data);
ntt_context_destroy(ctx);

Project Structure

amo-lean/
├── AmoLean/                    # Lean source (32,650 LOC, 81 files)
│   ├── NTT/                    # NTT specification + proofs
│   │   ├── Spec.lean           # O(N^2) reference specification
│   │   ├── CooleyTukey.lean    # O(N log N) verified algorithm
│   │   ├── Butterfly.lean      # Butterfly operation proofs
│   │   ├── LazyButterfly.lean  # Harvey optimization (lazy reduction)
│   │   ├── Correctness.lean    # Main equivalence theorems
│   │   └── Radix4/             # Verified radix-4 implementation
│   ├── Field/                  # Field implementations
│   │   └── Goldilocks.lean     # Goldilocks (p = 2^64 - 2^32 + 1)
│   ├── EGraph/                 # E-Graph optimization engine
│   │   ├── Optimize.lean       # Equality saturation
│   │   └── VerifiedRules.lean  # 19/20 rules with formal proofs
│   ├── FRI/                    # FRI protocol components
│   ├── Sigma/                  # Sigma-SPL IR definitions
│   ├── Matrix/                 # Matrix algebra + permutations
│   ├── Verification/           # Correctness proofs
│   │   ├── AlgebraicSemantics.lean  # Lowering correctness (3,693 LOC)
│   │   ├── FRI_Properties.lean      # FRI folding proofs (0 sorry)
│   │   └── Poseidon_Semantics.lean  # Poseidon2 verification
│   └── Backends/               # Code generation
│
├── generated/                  # Generated C code
│   ├── field_goldilocks.h      # Goldilocks arithmetic (scalar)
│   ├── field_goldilocks_avx2.h # Goldilocks arithmetic (AVX2)
│   ├── ntt_kernel.h            # Lazy butterfly kernel
│   ├── ntt_context.h           # NTT context with caching
│   └── poseidon2_bn254_t3.h    # Poseidon2 hash
│
├── libamolean/                 # Distributable header-only C library
├── verification/plonky3/       # Plonky3 verification suite (Rust FFI)
├── Tests/                      # Test suites (2850+ tests)
└── docs/                       # Documentation
    ├── BENCHMARKS.md            # Full benchmark report
    └── project/                 # Design decisions, progress logs

Verification Status

Formal Proofs (Lean 4)

ComponentStatusSorryDetail
NTT Core (Butterfly, CooleyTukey, Correctness)100%0Fully proven
Radix-4 NTT100%0Verified 4-point butterfly
FRI Folding100%0Proven in FRI_Properties.lean
Matrix/Perm100%0All lemmas proven
E-Graph Rewrite Rules95%019/20 rules verified
Goldilocks Field93%05 axioms (well-known properties)
AlgebraicSemantics95%318/19 cases proven; kron pending
Poseidon2Computational12All backed by test vectors

Codebase: 32,650 LOC | 17 axioms | 30 sorry (5 active, 12 computational, 7 deprecated, 6 commented-out)

Testing (2850+ tests, 0 failures)

Test SuiteTestsStatus
Goldilocks Field (+ UBSan)37Pass
NTT Correctness (oracle + bit-reversal + kernel + sanitizer)141Pass
Plonky3 Equivalence64Pass
FRI Protocol (fold + validation)10Pass
Poseidon2 (S-box + vectors + differential)10Pass
E-Graph Optimizer4Pass
Hardening (fuzz + FFI stress + ABI)120Pass
Lean Build (modules)2641Pass
Total~28500 failures

Key Design Decisions

  1. Equality Saturation (E-Graphs): Optimization via verified rewrite rules rather than ad-hoc transformations. Every optimization is a theorem.
  2. Sigma-SPL Algebraic IR: Matrix expressions lowered through scatter/gather semantics. 18/19 operations formally verified.
  3. Lazy Reduction (Harvey 2014): Defer modular reduction in butterfly operations for reduced modular arithmetic overhead.
  4. Skeleton + Kernel Architecture: Manual C loop structure (skeleton) with Lean-verified butterfly (kernel). Combines performance control with formal guarantees.
  5. Twiddle Factor Caching: Pre-computed twiddle factors for all NTT layers, stored in NttContext.
  6. Nat in Lean Proofs: Use arbitrary-precision naturals to avoid overflow reasoning during proofs. C code uses fixed-width integers with verified bounds.
  7. Goldilocks Specialization: Exploit p = 2^64 - 2^32 + 1 structure for efficient reduction without Barrett/Montgomery.

Future Work

TaskRelevanceDifficultyStatus
Kron bounds (3 sorry in AlgebraicSemantics)Medium -- internal to verification, no external impactHigh -- requires scatter/gather in-bounds proofsDocumented, non-blocking
Poseidon formal proofs (12 sorry)Medium -- currently validated computationallyMedium -- Lean match splitter limitationTests pass; formal proofs optional
BabyBear / Mersenne31 fieldsHigh -- enables Risc0/SP1 verificationMedium -- architecture supports itDesigned, not started
Rust code generationHigh -- direct integration with Rust zkVMsMedium -- requires new backendFuture
NTT Radix-4 codegenMedium -- potential 20-30% speedupLow -- butterfly already verifiedDesigned
Merkle tree invariants (2 sorry in FRI)Low -- structural, no correctness riskLowDocumented
Deprecated sorry cleanup (7 in Theorems.lean)Low -- superseded by AlgebraicSemanticsTrivial -- delete fileHousekeeping

None of these items block the current release. The 5 active sorry statements are isolated to internal verification modules and do not affect the correctness of generated code or test results.

References

  1. egg: Willsey et al. "egg: Fast and Extensible Equality Saturation" (POPL 2021)
  2. Fiat-Crypto: Erbsen et al. "Simple High-Level Code For Cryptographic Arithmetic"
  3. FRI: Ben-Sasson et al. "Fast Reed-Solomon Interactive Oracle Proofs of Proximity"
  4. Harvey NTT: Harvey "Faster arithmetic for number-theoretic transforms" (2014)
  5. Plonky3: Polygon Zero's high-performance STARK library
  6. Sigma-SPL: Franchetti et al. "SPIRAL: Code Generation for DSP Transforms"

License

MIT License -- see LICENSE for details.


AMO-Lean v1.0.1 -- Formal verification meets practical performance.