LeanBandits | Reservoir

Bandit algorithms in Lean

This repository contains a Lean formalization of regret bounds for several stochastic bandit algorithms.

Authors: Rémy Degenne, Paulo Rauber.

Main results:

Contents:

Definitions of an iterative, stochastic algorithm and a stochastic environment
Proofs of the existence of probability spaces on which those algorithm-environment interactions are defined, and uniqueness of the resulting laws
Notations and tools to analyze bandits: number of times an arm was pulled, time of the nth pull, regret, gap. Relations between those.
Concentration inequalities
Definitions of ETC and UCB
Proofs of regret bounds for those two algorithms.