RL-Theory | Reservoir

RL Theory in Lean

Installation

See https://lean-lang.org/install/ for a detailed tutorial on setting up Lean.
Confirm that Lean is set up by lake --version.

git clone git@github.com:ShangtongZhang/rl-theory-in-lean.git
cd rl-theory-in-lean
lake exe cache get
lake build

It is recommended to use this project with the Lean 4 extension of VSCode.

Paper

Towards Formalizing Reinforcement Learning Theory

@article{zhang2025formalizing,
    title={Towards Formalizing Reinforcement Learning Theory},
    author={Shangtong Zhang},
    year={2025},
    journal={arXiv preprint arXiv:2511.03618}
}

This project is organized by mirroring the structure of Mathlib so the counterparts can be upstreamed to Mathlib easily. Below are some main results when I first completed the project at this commit. Items struck out are results that are no longer in the latest repo for various reasons (e.g., newer Mathlib now contains those results).

Algorithm
- LinearTD.lean
  - [LinearTDIterates] - linear TD algorithm.
  - [DecreaseAlong] - norm qualifies a Lyapunov function for linear TD.
  - ae_tendsto_of_linearTD_iid — almost sure convergence of linear TD with i.i.d. samples.
  - ae_tendsto_of_linearTD_markov — almost sure convergence of linear TD with Markovian samples.
- QLearning.lean
  - [QLearningIterates] — Q-learning algorithm.
  - [DecreaseAlong] - norm with a sufficiently large qualifies a Lyapunov function for Q-learning.
  - ae_tendsto_of_QLearning_iid — almost sure convergence of Q-learning with i.i.d. samples.
  - ae_tendsto_of_QLearning_markov — almost sure convergence of Q-learning with Markovian samples.
StochasticApproximation
- Iterates.lean - stochastic approximation (SA) iterates and non-uniform bounds.
  - [Iterates] — a general form of SA iterates with Martingale difference noise and additional noise.
  - [IteratesOfResidual] — a specific form of SA iterates driven by residual of an operator.
  - [AdaptedOnSamplePath] — SA iterates are completely determined by available information until current time step on every sample path.
- CondExp.lean
  - condExp_iterates_update — computes the conditional expectation of SA iterates in the Ionescu-Tulcea probability space.
- Lyapunov.lean
  - [LyapunovFunction] — desired properties for a function to be used as a Lyapunov function to analyze SA iterates.
- LpSpace.lean - squared norm qualifies a Lyapunov function
  - smooth_half_sq_Lp — squared norm is smooth.
- Pathwise.lean
  - fundamental_inequality — recursive bounds of errors based on a Lyapunov function on an individual sample path.
- DiscreteGronwall.lean — discrete Gronwall inequalities.
- MartingaleDifference.lean
  - ae_tendsto_of_iterates_mds_noise — convergence of SA iterates with Martingale difference noise.
- IIDSamples.lean
  - ae_tendsto_of_iterates_iid_samples — convergence of SA iterates with i.i.d. samples.
- StepSize.lean
  - anchors_of_inv_poly — inverse polynomial step size qualifies the skeleton iterates technique that can convert Markov noise to Martingale difference noise.
- MarkovSamples.lean
  - ae_tendsto_of_iterates_markov_samples — convergence of SA iterates with Markovian samples.
  - ae_tendsto_of_iterates_markov_samples_of_inv_poly — specializes to inverse polynomial step size schedules.
Data
- ~~Int/GCD.lean~~
  - all_but_finite_of_closed_under_add_and_gcd_one — if a set has gcd 1 and is close under addition, it then contains all large enough integers.
- Matrix/PosDef.lean — positive definiteness (p.d.) for asymmetric real matrices.
  - posDefAsymm_iff' — a lower bound for the quadratic form of a p.d. matrix.
- Matrix/Stochastic.lean — stochastic vectors and row stochastic matrices.
  - smat_minorizable_with_large_pow — an irreducible and aperiodic stochastic matrix is Doeblin minorizable after sufficient powers.
  - smat_contraction_in_simplex - Doeblin minorization implies contraction in simplex
  - stationary_distribution_exists - produces one stationary distribution via Cesaro limit.
  - stationary_distribution_uniquely_exists — stationary distribution uniquely exists under irreducibility and aperiodicity.
  - [GeometricMixing] - geometric mixing under irreducibility and aperiodicity.
Probability
- MarkovChain/Defs.lean — basics of time homogeneous Markov chains.
- MarkovChain/Finite/Defs.lean — matrix representation of finite Markov chains
- MarkovChain/Trajectory.lean
  - traj_prob — constructs the sample path space probability measure for Markov chains by the Ionescu-Tulcea theorem.
- Martingale/Convergence.lean
  - ae_tendsto_zero_of_almost_supermartingale — Robbins-Siegmund theorem for the convergence of almost supermartingales.
MarkovDecisionProcess
- MarkovRewardProcess.lean — basics of finite MRPs.
  - [NegDefAsymm MRP.K] — negative definiteness of the matrix
- MarkovDecisionProcess.lean — basics of finite MDPs.

Remarks

When I first completed the project at this commit, the Lean toolchain has the version 4.22. At that time, I hated to have calc, have, and comments in the code so I tried my best to avoid them as much as possible, which I know is not good for readability. Later on, the Lean toolchain is bumped to 4.28 in this PR. According to the author of the PR, the code there is 100% generated via LLMs so that PR introduces lots of good coding style that increases readability (e.g., long calc, many have). An unavoidable consequence is that the coding style is now inconsistent across the project.

RL-Theory0.1.0

RL Theory in Lean

Installation

Paper

Main Results

Remarks