proofs

Proofs written in Lean4 for the core katydid validation algorithm

Check Proofs

Goal

The goal is to formalize the core katydid validation algorithm. This algorithm allows us to validate millions of serialized data structures per second on a single core. The algorithm is based on derivatives for regular expressions and extends this to Visibly Pushdown Automata (VPA), by splitting the derivative function into two functions. It also includes several basic optimizations, such as memoization, simplification, laziness, zipping of multiple states, short circuiting, evaluation at compilation, symbolic derivatives and a pull based parser for serialized data structures that allows us to skip over some of the parsing. You can play around with the validation language on its playground.

Plan

This is just a quick overview of the steps towards our goal.

Symbolic regular expressions

Prove theorems about Symbolic regular expressions as a foundation to build upon.

  • Prove correctness of derivative algorithm via a commuting diagram.
  • Prove correctness of derivative algorithm via a Regex type indexed with Language.
  • Prove decidability of derivative algorithm.
  • Prove correctness of simplification rules.
  • Prove correctness of smart constructors.

Reuse as much as we can from our previous work in Coq and our attempt at Reproving Agda in Lean

Symbolic predicates

  • Create expression language as described in the post: Derivatives of Symbolic Automata explained
  • Prove correctness of simplification rules for or, and, false, etc.
  • Prove that non-reader functions can be pre-computed before evaluating time
  • Prove that the optimized comparison method using a hash is comparable (transitive, associative, etc.)

Katydid Algorithm

  • Create Language definition for the symbolic tree expressions.
  • Code Pull-based Parser class in Lean and implement JSON as an example.
  • Code Katydid algorithm in Lean.
  • Prove correctness of derivative tree algorithm.
  • Prove decidablity of derivative tree algorithm.
  • Prove that the simple tree function and the VPA functions are equivalent and equivalent to the inductive predicate.
  • Prove correctness of new simplification rules
  • Prove all optimizations of the katydid algorithm

Contributing

Please check the prerequisites and read the contributing guidelines. The contributing guidelines are short and shouldn't be surprising.

Understanding Brzozowski derivatives

Contributing to this repo requires an understanding the underlying algorithm that is the subject of the proofs in this repo:

Understanding LeanProver

This repo also requires an understanding of proof assistants, since all the proofs in this repo are done using LeanProver:

  • Knowledge of dependent types, induction and understanding the difference between a property True and a boolean true. We recommend reading The Little Typer to gain an understanding of the basics.
  • Experience with an Interactive Theorem Prover, like Coq or Lean, including using tactics and Inductive Predicates. If you are unfamiliar with interactive theorem provers you can watch our talk for a taste. We recommend reading Coq in a Hurry as a quick overview and Coq Art up to Chapter 8: Inductive Predicates for a proper understanding.

Optionally the following will also be helpful, but this is not required:

Questions about Lean4 can be asked on proofassistants.stackexchange by tagging questions with lean and lean4 or in the Zulip Chat.

Setup

  • Lean4 has exceptional instructions for installing Lean4 in VS Code.
  • Remember to also add lake (the build system for lean) to your PATH. You can do this on mac by adding export PATH=~/.elan/bin/:${PATH} to your ~/.zshrc file
  • Use mathlib's cache to speed up building time by running: $ lake exe cache get