SuperTensor | Reservoir

SuperTensor-lean

Verified tensor graph optimization in Lean 4: constructive soundness proofs + equality saturation + verified extraction via e-graph↔circuit bijection + multi-target code generation.

SuperTensor-lean is a formally verified tensor graph optimizer built in Lean 4. It applies equality saturation — a technique for exploring exponentially many equivalent program forms simultaneously — to tensor computation graphs, with every rewrite rule carrying a machine-checked proof of semantic preservation.

Unlike existing systems that verify rewrites with SMT solvers or Datalog, SuperTensor-lean uses constructive proofs in a dependently typed proof assistant. An unsound rewrite rule literally cannot be constructed: the SoundTensorRule type requires sound : ∀ env, lhs.eval env = rhs.eval env as a field, making the optimizer correct by construction rather than by testing.

The Problem

Tensor compilers (XLA, TVM, TensorRT, ONNX Runtime, PyTorch Inductor) apply hundreds of graph rewrite rules to optimize deep learning workloads. These rewrites are unverified: PolyJuice (OOPSLA 2024) found 84 bugs in production tensor compilers by fuzzing their rewrite rules, with 49 confirmed by developers — each corresponding to an invalid transformation that silently produces wrong numerical results.

Current approaches to this problem are partial:

TensorRight (POPL 2025) verifies rules via SMT but does not generate code
TENSAT optimizes with ILP extraction but does not verify rules
Constable (OOPSLA 2025) optimizes with fusion awareness but without formal verification
Scalify verifies graph properties via Datalog but without a proof assistant
None produce constructive proofs that can be independently checked

How It Works

Input: ONNX Computation Graph
  |
  |  Parse (12 ops: MatMul, Add, Mul, Reshape, Transpose, ReduceSum, ...)
  v
TensorExpr  (rank-k DSL, 17 constructors, shapes indexed by List Nat)
  |
  |  Flatten to e-graph (each subexpr → e-class)
  v
E-Graph  (verified union-find + hashcons, 230+ theorems)
  |
  |  Equality saturation with 48 verified rewrite rules
  |  Strategy: exhaustive | MCTS-guided | hybrid
  |  Compiled decision-tree matcher (Gross 2022)
  |  (each rule: lhs.eval env = rhs.eval env, machine-checked)
  v
E-Graph with Equivalences
  |
  |  E-graph ↔ monotone circuit bijection (Sun et al. 2024)
  |  Circuit simplification (verified semantics-preserving)
  |  Extraction: greedy | ILP | treewidth DP | smart routing
  |  Cost model: flat, shape-aware, GPU/CPU-aware, lexicographic
  v
Optimized TensorExpr
  |
  |  Lower to TensorSigma IR (loop/tile/fuse/par)
  |  SemanticPreservation: lower_kernelCount_eq_opCount (CompCert-style)
  v
TensorSigma
  |
  |  Code generation
  v
C / Rust / CUDA code

Soundness argument: Rules carry proofs. Translation validation independently checks the optimization via 21 congruence theorems. The cost model only affects performance, not correctness. The e-graph engine itself is NOT in the trusted computing base — even if the engine has a bug, the final result is validated. Semantic preservation is verified at each IR boundary via CompCert-style forward simulation.

Where It Fits in the ML Pipeline

Framework          Export           Graph Optimizer          Hardware Compiler
---------          ------           ---------------          -----------------
PyTorch    ---->   ONNX    ---->   SuperTensor-lean  ---->   CUDA kernels
JAX        ---->   StableHLO       (verified rewrites)       CPU SIMD
TensorFlow ---->                                             Custom accelerators

SuperTensor-lean replaces the unverified graph rewrite pass in systems like XLA, TVM, and TensorRT with formally verified equivalence-preserving transformations. It accepts standard ONNX input and produces standard C/Rust/CUDA output, making it compatible with existing ML compiler ecosystems without requiring changes upstream or downstream.

Competitive Landscape

System	Verifies Rules	Constructive Proofs	Extraction	Circuit Bijection	Generates Code	LOC
TensorRight (POPL 2025)	SMT	No	—	No	No	—
TENSAT (Yang 2021)	No	No	ILP	No	Yes	—
Constable (OOPSLA 2025)	No	No	Greedy	No	Yes	—
extraction-gym (2024)	No	No	8 methods	No	No	—
faster-ilp-cbc (2024)	No	No	ILP (CBC)	No	No	—
SuperTensor-lean	Constructive	Lean 4	6 verified	First mechanization	C/Rust/CUDA	~19,100

Main Contribution

The core insight is embedding the soundness proof into the rule's type:

structure SoundTensorRule (α : Type) [CommRing α] (s : List Nat) where
  lhs   : TensorExpr α s
  rhs   : TensorExpr α s
  sound : ∀ env : TensorEnv α, lhs.eval env = rhs.eval env
  name  : String

This makes the optimizer verified by construction: you cannot add a rule to the system without providing a machine-checked proof that it preserves semantics for all possible inputs. The 48 rules in the current system span four categories:

15 algebraic rules: commutativity, associativity, distributivity, negation, scalar multiplication, matmul associativity
15 fusion rules: matmul-neg fusion, scalar factoring, reshape composition, triple negation
12 operation rules: negation/addition/multiplication through broadcast, slice, concat, pad, transpose, reshape, reduce, contract
6 tiling rules: verified tile split/merge with shape preservation proofs

Each rule is proved once and trusted forever.

Quick Start

# Requires Lean 4 v4.26.0 (lean-toolchain specifies exact version)
lake build

# Run the 12-section compilable demo (0 errors, 0 sorry)
lake env lean demo_walkthrough_supertensor.lean

Expected output: 0 sorry. All 984 build jobs pass with 0 errors.

Project Statistics

Metric	Value
Lines of code	~19,100
Verified rewrite rules	48 (15 algebraic + 15 fusion + 12 operation + 6 tiling)
Theorems/lemmas	~310
Examples/benchmarks	862+ examples, 73 stress tests
Sorry remaining	0
TensorExpr constructors	17
Congruence theorems	21
Codegen backends	3 (C, Rust, CUDA)
ONNX ops supported	12
Extraction methods	6 (greedy, ILP, warm-start ILP, circuit-pruned ILP, treewidth DP, smart routing)
Saturation strategies	3 (exhaustive, MCTS, hybrid) + incremental with versioning
Cost models	flat, GPU/CPU-aware, shape-aware, hierarchical 4-component, lexicographic
Circuit bijection	First mechanization of Sun et al. 2024 (e-graph ↔ monotone circuit)
Semantic preservation	CompCert-style forward simulation chain
Build jobs	984
Lean version	v4.26.0
Dependencies	Mathlib4 (CommRing, Finset.sum, BigOperators)

Project Structure

SuperTensor-lean/
  SuperTensor/
    Tensor/         TensorExpr DSL, concrete Tensor type, denotational semantics,
                    shape theory (16 lemmas), index arithmetic
    EGraph/         Verified e-graph engine:
      Core.lean       Union-find (44 theorems), hashcons, merge, rebuild
      CoreSpec.lean   78 theorems: semantic soundness, hashcons invariant
      EMatch.lean     Pattern matching over e-graphs
      Extract.lean    Greedy extraction with cost models
      ILPExtract.lean ILP branch-and-bound + warm-start extraction
      Circuit.lean          MonotoneCircuit + monotone_eval theorem
      CircuitTranslate.lean E-graph → circuit translation
      CircuitBijection.lean Forward/backward/roundtrip bijection proofs
      CircuitSimplify.lean  Verified circuit simplification rules
      CircuitPrune.lean     Circuit-based ILP variable pruning
      TreeDecomp.lean       Tree decomposition + verified checker
      TreewidthExtract.lean Treewidth-bounded DP extraction + smart routing
      CompiledMatcher.lean  Decision-tree pattern matcher (Gross 2022)
      TranslationValidation.lean  21 congruence theorems
      MCTS.lean       Monte Carlo Tree Search saturation
    Rules/          SoundTensorRule framework, 48 verified rules, LLM synthesis
    Sigma/          TensorSigma IR (loop/tile/fuse/par), lowering,
                    CompCert-style SemanticPreservation chain
    Cost/           Cost models: flat, shape-aware, GPU/CPU, hierarchical, lexicographic
    CodeGen/        C, Rust, CUDA backends
    Parse/          ONNX parser (12 operations)
    Pipeline.lean   End-to-end: parse → saturate → extract → lower → codegen
                    VerifiedPipelineConfig + pipeline_sound theorem
    Benchmarks.lean 862+ examples, 73 stress tests
  demo_walkthrough_supertensor.lean   12-section compilable demo (586 LOC, 0 sorry)
  demo_walkthrough_supertensor.md     Companion walkthrough with comparison tables

Verification Status

0 sorry. 0 axioms. All proofs are complete and machine-checked.

All 48 rewrite rules carry constructive soundness proofs. The verified extraction pipeline (Fase 11) adds the e-graph↔circuit bijection, circuit simplification, treewidth-bounded DP, and CompCert-style semantic preservation — all with 0 sorry. Three witness structures (SimplifyCorrectness, WellFormedTranslation, DPOptimalityWitness) are validated computationally via 862+ examples and await full constructive instantiation.

Demo

See demo_walkthrough_supertensor.lean for a compilable 12-section tour and demo_walkthrough_supertensor.md for the companion walkthrough with comparison tables vs egg, TENSAT, TVM/XLA, extraction-gym, and faster-ilp-cbc.

Future Work

Instantiate witness structures with full constructive proofs (SimplifyCorrectness, WellFormedTranslation, DPOptimalityWitness)
Add convolution rewrite rules (currently only e-graph representation)
End-to-end benchmarks on real ONNX models
Integration with MLIR/StableHLO frontend

References

E-Graphs as Circuits (Sun et al., 2024) — Bijection e-graph↔monotone circuit, treewidth extraction
Accelerating Verified Compiler (Gross et al., 2022) — Pattern-matching compilation, decision trees
CompCert (Leroy et al., 2016) — Verified multi-IR pipeline, simulation diagrams
TENSAT (Yang et al., MLSys 2021) — ILP extraction for tensor graphs
TensorRight (Arora et al., POPL 2025) — SMT-verified rewrite rules for XLA
Constable (Vohra et al., OOPSLA 2025) — Fusion-aware tensor optimization
egg (Willsey et al., POPL 2021) — E-graph equality saturation framework
PolyJuice (Liu et al., OOPSLA 2024) — Fuzzing tensor compiler rewrites
AlphaTensor (Fawzi et al., Nature 2022) — MCTS for matrix multiplication

License

MIT

SuperTensor0.1.0