[{"data":1,"prerenderedAt":4},["ShallowReactive",2],{"25YAlLBpdy":3},"# evm.asm: A Verified Macro Assembler for building zkEVM in Lean 4 (early experiment)\n\nA prototype implementation of a verified macro assembler targeting the zkEVM,\nbuilt on a RISC-V RV64IM backend, inspired by:\n\n> Andrew Kennedy, Nick Benton, Jonas B. Jensen, Pierre-Evariste Dagand.\n> **\"Coq: The world's best macro assembler?\"**\n> *Proceedings of the 15th International Symposium on Principles and Practice\n> of Declarative Programming (PPDP 2013)*, September 2013, ACM.\n> https://www.microsoft.com/en-us/research/publication/coq-worlds-best-macro-assembler/\n\n## Warning: Experimental Prototype Only\n\n**DO NOT USE THIS PROJECT FOR ANYTHING OF VALUE.**\n\nThis is an experimental research prototype with significant limitations:\n\n- **No RISC-V spec compliance**: The instruction semantics are vibe-generated and\n  have NOT been validated against the official RISC-V specification. There may\n  be subtle (or not-so-subtle) deviations from actual RISC-V behavior.\n- **No EVM spec compliance**: The specs for examples are also vibe-generated and\n  have NOT been validated against the EVM specification.\n- **No conformance testing**: No systematic testing has been performed to verify\n  that this implementation matches real RISC-V processors or simulators. No testing has been performed against EVM either.\n- **Prototype quality**: This code is for educational and research purposes to\n  explore verified macro assembly techniques, not for production use.\n\n## Motivation: Eliminating Compiler Trust in zkEVM\n\nThe usual way to use zkVMs is to compile high-level programs to RISC-V\nassembly, then prove correctness of the execution trace using a zero-knowledge\nproof system. The proof covers the *execution trace*, but it cannot cover the\n*compiler*. If the compiler is buggy or malicious, the proof might not\nmatch the developer's (or the receiver's) intent, even though the ZK proof is valid, and even if the\nsource code is correct.\n\n**evm.asm** explores an alternative: write programs directly as RISC-V code,\nand *prove* their correctness in Lean 4 before the ZK proof is ever\ngenerated. The goal is that a developer (or a receiver of a ZK proof) never has to trust a compiler\nfor the guest program.\n\nMore specifically, evm.asm aims at building the guest part of the **zkEVM**. Reducing trusted computing base matters for this usage.\n\n## Key Idea\n\nLean 4 serves simultaneously as:\n\n1. **An assembler**: Instructions are an inductive type; programs are lists of\n   instructions with sequential composition (`;;`).\n2. **A macro language**: Lean functions that produce programs act as macros,\n   using all of Lean's facilities (recursion, pattern matching, conditionals).\n3. **A specification language**: Hoare triples with separation logic assertions\n   express correctness properties of EVM opcodes and macro compositions.\n4. **A proof assistant**: Lean's kernel verifies that macros meet their\n   specifications, with no external oracle required.\n\n## Example: What a Verified EVM Opcode Looks Like\n\nEach EVM opcode is implemented as a sequence of RISC-V instructions operating on\n4×64-bit limbs. A **stack-level spec** ties the low-level implementation back to\nthe 256-bit EVM semantics using `evmWordIs` — an assertion that four consecutive\nmemory words encode a single `EvmWord` (a `BitVec 256`):\n\n```lean\n-- An EvmWord is stored as 4 limbs of 64 bits at consecutive addresses\ndef evmWordIs (addr : Addr) (v : EvmWord) : Assertion :=\n  (addr ↦ₘ v.getLimb 0) ** ((addr + 8) ↦ₘ v.getLimb 1) **\n  ((addr + 16) ↦ₘ v.getLimb 2) ** ((addr + 24) ↦ₘ v.getLimb 3)\n```\n\nHere is the stack-level spec for the 256-bit AND opcode\n(`EvmAsm/Evm64/And/Spec.lean`). It says: starting from two `EvmWord`s `a` and\n`b` on the stack, the 17-instruction RISC-V program `evm_and_code` produces\n`a &&& b` — with a machine-checked proof:\n\n```lean\n/-- Stack-level 256-bit EVM AND: operates on two EvmWords via evmWordIs. -/\ntheorem evm_and_stack_spec (sp base : Addr)\n    (a b : EvmWord) (v7 v6 : Word)\n    (hvalid : ValidMemRange sp 8) :\n    let code := evm_and_code base\n    cpsTriple base (base + 68) code\n      (-- precondition: stack pointer, scratch registers, two 256-bit words\n       (.x12 ↦ᵣ sp) ** (.x7 ↦ᵣ v7) ** (.x6 ↦ᵣ v6) **\n       evmWordIs sp a ** evmWordIs (sp + 32) b)\n      (-- postcondition: sp advanced, result is a &&& b\n       (.x12 ↦ᵣ (sp + 32)) ** (.x7 ↦ᵣ (a.getLimb 3 &&& b.getLimb 3)) **\n       (.x6 ↦ᵣ b.getLimb 3) **\n       evmWordIs sp a ** evmWordIs (sp + 32) (a &&& b))\n```\n\nThe statement is a Hoare triple (`cpsTriple`) with separation logic assertions.\nThe precondition describes the machine state before: register `x12` holds the\nstack pointer, and two 256-bit words `a`, `b` sit at `sp` and `sp+32`. The\npostcondition says that after running 68 bytes of RISC-V code, the word at\n`sp+32` now holds `a &&& b` — the bitwise AND defined by Lean's `BitVec 256`.\n\nThe proof composes four per-limb specs (one AND per 64-bit limb) using the\n`runBlock` tactic, then lifts to the `evmWordIs` abstraction via\n`cpsTriple_consequence`:\n\n```lean\n  -- 1. Compose 4 per-limb ANDs + stack pointer adjustment (limb-level proof)\n  have L0 := and_limb_spec 0 32 sp a0 b0 v7 v6 base ...\n  have L1 := and_limb_spec 8 40 sp a1 b1 ...\n  have L2 := and_limb_spec 16 48 sp a2 b2 ...\n  have L3 := and_limb_spec 24 56 sp a3 b3 ...\n  have LADDI := addi_spec_gen_same .x12 sp 32 ...\n  runBlock L0 L1 L2 L3 LADDI\n\n  -- 2. Lift to evmWordIs using EvmWord.getLimb_and semantic lemma\n  exact cpsTriple_consequence ...\n    (fun h hp => by simp only [evmWordIs] at hp; ... ; xperm_hyp hp)\n    (fun h hq => by simp only [evmWordIs, EvmWord.getLimb_and]; ... ; xperm_hyp hq)\n    h_main\n```\n\nLean's kernel checks every step — from individual instruction semantics to the\nfinal `a &&& b` result. No external solver or SMT oracle required.\n\n## Project Structure\n\n```\nEvmAsm/\n  Rv64/                       -- RV64IM backend\n    Basic.lean                --   Machine state: registers (64-bit), memory, PC\n    Instructions.lean         --   RV64IM instruction set and semantics\n    Program.lean              --   Programs as instruction lists, sequential composition\n    Execution.lean            --   Branch-aware execution, code memory, step/stepN\n    SepLogic.lean             --   Separation logic assertions and combinators\n    CPSSpec.lean              --   CPS-style Hoare triples, branch specs, structural rules\n    ControlFlow.lean          --   if_eq macro, symbolic proofs, pcIndep\n    GenericSpecs.lean         --   Generic specs parameterized over instructions\n    InstructionSpecs.lean     --   Per-instruction CPS specs\n    SyscallSpecs.lean         --   Syscall specs: HALT, WRITE, HINT_READ\n    Tactics/\n      PerfTrace.lean          --   Performance tracing infrastructure\n      XPerm.lean              --   xperm tactic: AC-permutation of sepConj chains\n      XSimp.lean              --   xperm_hyp/xsimp tactics: assertion implication\n      XCancel.lean            --   xcancel tactic: cancellation with frame extraction\n      SeqFrame.lean           --   seqFrame tactic: auto frame+compose cpsTriple specs\n      LiftSpec.lean           --   liftSpec tactic: lift instruction specs\n      RunBlock.lean           --   runBlock tactic: block execution automation\n      SpecDb.lean             --   @[spec_gen] attribute and spec database\n  Evm64/                      -- EVM opcodes on RV64IM (4x64-bit limbs)\n    Basic.lean                --   EvmWord (BitVec 256), getLimb64, fromLimbs64\n    Stack.lean                --   evmWordIs, evmStackIs, pcFree lemmas\n    EvmWordArith.lean         --   Math correctness lemmas (carry chains, etc.)\n    Compare/\n      LimbSpec.lean           --   Shared comparison per-limb specs (lt, beq, slt_msb)\n    Add/                      --   256-bit ADD\n      Program.lean            --     RV64 program definition\n      LimbSpec.lean           --     Per-limb specs (add_limb0, add_limb_carry)\n      Spec.lean               --     Full composition + stack-level spec\n    Sub/                      --   256-bit SUB (same layout as Add/)\n    And/                      --   256-bit AND (Program + LimbSpec + Spec)\n    Or/                       --   256-bit OR\n    Xor/                      --   256-bit XOR\n    Not/                      --   256-bit NOT\n    Lt/                       --   256-bit LT (Program + Spec, uses Compare/LimbSpec)\n    Gt/                       --   256-bit GT\n    Eq/                       --   256-bit EQ (Program + LimbSpec + Spec)\n    IsZero/                   --   256-bit ISZERO (Program + LimbSpec + Spec)\n    Slt/                      --   256-bit SLT signed (Program + Spec, uses Compare/LimbSpec)\n    Sgt/                      --   256-bit SGT signed\n    Pop/                      --   POP (Program + Spec)\n    Push0/                    --   PUSH0 (Program + Spec)\n    Dup/                      --   DUP1-16 (Program + Spec)\n    Swap/                     --   SWAP1-16 (Program + Spec)\n    Multiply/                 --   MUL (Program + LimbSpec, schoolbook 4x4 limb)\n    DivMod/                   --   DIV/MOD (Program + LimbSpec + Compose, Knuth Algorithm D)\n    SignExtend/               --   SIGNEXTEND (Program + LimbSpec + Compose + Spec)\n    Shift/                    --   SHR/SHL/SAR (Program + LimbSpec + ShlSpec + SarSpec + Compose + ShlCompose + SarCompose + Semantic + ShlSemantic + SarSemantic)\n    Byte/                     --   BYTE (Program + LimbSpec + Spec)\n    zkvm-standards/           --   Submodule: zkVM RISC-V target standards\nEvmAsm.lean                  -- Top-level module hub\nEvmAsm/Rv64.lean             -- Rv64 module hub\nEvmAsm/Evm64.lean            -- Evm64 module hub\nexecution-specs/              -- Submodule: Ethereum execution specs\n```\n\n## Building\n\n```bash\n# Install elan (Lean version manager) if not already installed\ncurl -sSf https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh | sh\n\n# download Mathlib cache (optional, recommended)\nlake exec cache get\n\n# Build the project\nlake build\n```\n\n## Status\n\nThis is a **prototype** demonstrating the approach. Current state:\n\n- **Infrastructure**: RV64IM backend with separation logic, CPS-style Hoare\n  triples, and automated tactics (`xperm`, `xcancel`, `seqFrame`, `liftSpec`,\n  `runBlock` with `@[spec_gen]` auto-resolution).\n- **Evm64 (0 sorry)** — targets `riscv64im_zicclsm-unknown-none-elf`,\n  4x64-bit limbs, 24 fully-proved opcodes:\n  AND, OR, XOR, NOT, ADD, SUB, MUL, DIV, MOD, SIGNEXTEND,\n  SHR, SHL, SAR, BYTE,\n  LT, GT, EQ, ISZERO, SLT, SGT,\n  POP, PUSH0, DUP1-16, SWAP1-16\n- **0 sorry across the entire codebase** (`lake build` clean).\n- **TODO**: EXP, ADDMOD, MULMOD, SDIV, SMOD,\n  MLOAD, MSTORE, interpreter loop, state transition function, connect to\n  sail-riscv-lean for RISC-V spec compliance, connect to EVM specs in Lean,\n  testing.\n\n## References\n\n- Kennedy, A., Benton, N., Jensen, J.B., Dagand, P.-E. (2013).\n  \"Coq: The world's best macro assembler?\" PPDP 2013.\n  https://www.microsoft.com/en-us/research/publication/coq-worlds-best-macro-assembler/\n- **SPlean** (Separation Logic Proofs in Lean), Verse Lab.\n  https://github.com/verse-lab/splean\n  The `xperm` / `xperm_hyp` / `xsimp` tactics in `Tactics/` are inspired by\n  SPlean's `xsimpl` tactic.\n- Charguéraud, A. (2020). \"Separation Logic for Sequential Programs\n  (Functional Pearl).\" *Proc. ACM Program. Lang.* 4, ICFP, Article 116.\n  https://doi.org/10.1145/3408998\n- **bedrock2**: https://github.com/mit-plv/bedrock2\n  The frame automation tactics (`xcancel`, `seqFrame`) in `Tactics/XCancel.lean`\n  and `Tactics/SeqFrame.lean` are inspired by bedrock2's separation logic\n  automation. Specifically:\n  - The `wcancel` tactic in `bedrock2/src/bedrock2/SepLogAddrArith.v` (lines 127-134)\n    inspired the cancellation approach: matching atoms by tag+address, computing\n    the frame as the residual of unmatched hypothesis atoms.\n  - The frame rule infrastructure in `bedrock2/src/bedrock2/FrameRule.v` (lines 75-175)\n    inspired the automatic frame extraction pattern where specs include a universal\n    frame parameter and tactics instantiate it during composition.\n  - The instruction specs with explicit frame in `compiler/src/compiler/GoFlatToRiscv.v`\n    (lines 439-546) informed the design of composing instruction specs with\n    `cpsTriple_frame_left` + `cpsTriple_seq_with_perm`.\n- Knuth, D.E. (1997). *The Art of Computer Programming, Volume 2:\n  Seminumerical Algorithms* (3rd ed.), §4.3.1 \"The Classical Algorithms.\"\n  Addison-Wesley. Algorithm D is used for the DIV/MOD opcodes in `Evm64/DivMod.lean`.\n- SP1 zkVM: https://github.com/succinctlabs/sp1\n  The `ECALL`-based syscall mechanism follows SP1's conventions.\n- zkvm-standards: https://github.com/eth-act/zkvm-standards\n  Tentative standards for zkVM RISC-V target, I/O interface, and C-interface accelerators.\n- sail-riscv-lean: https://github.com/opencompl/sail-riscv-lean\n- RISC-V ISA specification: https://riscv.org/technical/specifications/\n",1776560099704]