Formalization of Algorithmic Information Theory in Lean 4
This repository contains a Lean 4 formalization of parts of algorithmic
information theory, centered on Kolmogorov complexity. The development uses
Mathlib's computability infrastructure and represents decompressors as partial
functions (Part).
The project is currently pinned to Lean v4.31.0 and Mathlib v4.31.0.
Current Status
The main branch builds with the pinned Lean/Mathlib toolchain and the library
sources are intended to stay free of sorry, admit, custom axiom, unsafe,
implemented_by, set_option profiler true, and project-specific
set_option maxHeartbeats overrides.
The recent Lean 4.31 update also reorganized the prefix-complexity and algorithmic-probability layers into smaller modules. Large Kraft-Chaitin, lower-semicomputable semimeasure, prefix-counting, and universal-semimeasure files have compatibility aggregators at the old import paths, while their definitions, API lemmas, and computability proofs now live in more focused submodules.
Build
Install Lean through elan, then run:
lake exe cache get
lake build
For a smaller check of the exported library target, run:
lake build KolmogorovMathlib
Scope
The original core development formalizes plain conditional complexity for bitstrings, a universal decompressor, invariance up to an additive constant, basic complexity inequalities, incompressibility, uncomputability of natural number complexity, and Chaitin-style incompleteness results. The second-incompleteness files use abstract formal-system interfaces in a Kritchman-Raz style rather than formalizing a concrete arithmetic system.
Recent modules add a prefix-complexity and algorithmic-probability layer. This includes prefix-free codes, prefix machines, conditional prefix complexity, optimal prefix decompressors, Kraft inequalities and converse constructions, two-stage and pair-coding infrastructure, a priori machine semimeasures, lower-semicomputable semimeasure interfaces, mixture and domination lemmas, and Kraft-Chaitin style coding infrastructure.
The prefix and semimeasure material contains substantial theorem statements and
checked infrastructure, but it should be read as an actively developed
formalization layer. In particular, the repository avoids claiming a finished
textbook equivalence theory with exact logarithmic coding-theorem equalities.
Many results are stated in additive-complexity or multiplicative semimeasure
forms over ENat and ENNReal.
Project Layout
KolmogorovMathlib/
├── Foundation/ # Search operators, recursively enumerable relations, Nat.bits encoding
├── Core/ # Maps, conditional/plain complexity, universal decompressor, invariance
├── Complexity/ # Bounds, incompressibility, uncomputability, incompleteness interfaces
├── Prefix/ # Prefix-free codes, prefix machines, prefix complexity, Kraft results
└── AlgorithmicProbability/ # A priori semimeasures, LSC semimeasures, mixtures, coding infrastructure
The top-level module KolmogorovMathlib.lean imports the library development.
Lake Metadata
The Lake package is named kolmogorov_complexity; the Lean library target is
KolmogorovMathlib. The Mathlib dependency is pinned in lakefile.toml and
lake-manifest.json to the Lean v4.31.0 ecosystem.
Development Notes
The repository should remain buildable with lake build after every public
change. Prefer small named helper lemmas and compatibility-preserving module
splits over large monolithic computability proofs.
When adding new material, keep definitions and lightweight API lemmas separate from heavier computability or encoding proofs once a file starts to grow. This keeps the project usable as a base for a much larger formalization of algorithmic information theory.