Bandit algorithms in Lean

This repository contains a Lean formalization of regret bounds for several stochastic bandit algorithms.

Authors: Rémy Degenne, Paulo Rauber.

Main results:

  • Framework for working on bandit algorithms in Lean.
  • Regret bound for the Explore-Then-Commit algorithm.
  • Regret bound for the UCB algorithm.

Contents:

  • Definitions of an iterative, stochastic algorithm and a stochastic environment
  • Proofs of the existence of probability spaces on which those algorithm-environment interactions are defined, and uniqueness of the resulting laws
  • Notations and tools to analyze bandits: number of times an arm was pulled, time of the nth pull, regret, gap. Relations between those.
  • Concentration inequalities
  • Definitions of ETC and UCB
  • Proofs of regret bounds for those two algorithms.