Proofs written in Lean4 for the core katydid validation algorithm

Check Proofs


The goal is to formalize the core katydid validation algorithm. This algorithm allows us to validate millions of serialized data structures per second on a single core. The algorithm is based on derivatives for regular expressions and extends this to Visibly Pushdown Automata (VPA), by splitting the derivative function into two functions. It also includes several basic optimizations, such as memoization, simplification, laziness, zipping of multiple states, short circuiting, evaluation at compilation, symbolic derivatives and a pull based parser for serialized data structures that allows us to skip over some of the parsing. You can play around with the validation language on its playground.


This is required reading for understanding the underlying algorithm, the subject of the proofs in this repo:

This repo also requires the following background:

  • Knowledge of dependent types, induction and understanding the difference between a property True and a boolean true. We recommend reading The Little Typer to gain an understanding of the basics.
  • Experience with an Interactive Theorem Prover, like Coq or Lean, including using tactics and Inductive Predicates. If you are unfamiliar with interactive theorem provers you can watch our talk for a taste. We recommend reading Coq in a Hurry as a quick overview and Coq Art up to Chapter 8: Inductive Predicates for a proper understanding.

Optionally the following will also be helpful, but this is not required:

Questions about Lean4 can be asked on proofassistants.stackexchange by tagging questions with lean and lean4 or in the Zulip Chat.


Pair Programming Proofs

We have pair programming sessions between 18:00 - 20:00 UK time, on dates that are announced on

  • If you would like to watch, this will be streamed on Twitch.
  • If you would like to do more than watch and join us inside the zoom session, please make sure you have met the prerequisites and email Walter.


This is just a quick overview of the steps towards our goal.

  • Create Symbolic Predicates
    • Create expression language as described in the post: Derivatives of Symbolic Automata explained
    • Prove simplification rules for or, and, false, etc.
    • Prove that non-reader functions can be pre-computed before evaluating time
    • Prove that the optimized comparison method using a hash is comparable (transitive, associative, etc.)
  • Create the symbolic regular expressions for trees
    • Steal as much as we can from our previous work in Coq
    • Prove that the induction predicate is decidable.
    • Create derivative functions for the symbolic tree expressions.
    • Prove that the simple tree function and the VPA functions are equivalent and equivalent to the inductive predicate.
    • Prove simplification rules
    • Prove all optimizations of the katydid algorithm
    • ...


Please check the prerequisites and read the contributing guidelines. The contributing guidelines are short and shouldn't be surprising.


  • Lean4 has exceptional instructions for installing Lean4 in VS Code.
  • Remember to also add lake (the build system for lean) to your PATH. You can do this on mac by adding export PATH=~/.elan/bin/:${PATH} to your ~/.zshrc file
  • Use mathlib's cache to speed up building time by running: $lake exe cache get