LeanBench

Hyperfine-inspired benchmarking library + CLI for Lean projects (no compiler changes).

Quick start

Create a bench module and import it from your Main.lean:

import LeanBench

bench "sum" do
  let mut acc := 0
  for i in [0:100000] do
    acc := acc + i
  if acc == 0 then
    IO.println ""

def main (args : List String) : IO UInt32 :=
  LeanBench.runMain args

Build and run:

lake build
lake exe leanbench --samples 20 --warmup 2

Install

Add LeanBench as a Lake dependency:

[[require]]
name = "LeanBench"
git = "https://github.com/alok/LeanBench"
rev = "main"

For local development, you can use a path dependency instead:

[[require]]
name = "LeanBench"
path = "../LeanBench"

Then set the bench driver (optional, enables lake bench):

benchDriver = "LeanBench/leanbench"

CLI

  • --config <path> read config from leanbench.toml/observatory.toml
  • --profile <name> select config profile (default: default)
  • --where <expr> filter benches with a small DSL (suite(x), tag(y), name(/re/), and/or/not)
  • --print-config print resolved config before running (to stderr)
  • --list list benches
  • --list --format json machine-readable bench list
  • --list-tags list tags (after filters)
  • --list-suites list suites (after filters)
  • --plan emit deterministic plan JSON and exit
  • --match <substr> filter by name substring
  • --tags <t1,t2,...> filter by any tag
  • --all-tags require all tags in --tags
  • --suite <name> filter by suite
  • --samples <n> override sample count
  • --warmup <n> override warmup count
  • --min-time-ms <n> run until total time exceeds N ms
  • --threads <n> run each sample with N parallel action tasks
  • --timeout-ms <n> set timeout metadata (ms)
  • --retries <n> retry a failing benchmark up to N times
  • --priority <n> set priority metadata
  • --group <name> set group metadata
  • --threads-required <n> set threads_required metadata
  • --seed <n> seed for deterministic shuffling
  • --shuffle shuffle bench order deterministically
  • --partition count:m/n|hash:m/n shard benches across workers
  • --plan-out <path> write plan JSON to file
  • --manifest-out <path> write run manifest JSON to file
  • --group-by suite group pretty output by suite
  • --format pretty|full|json|jsonl|radar (full adds median/p95/p99)
  • --json alias for --format json
  • --jsonl alias for --format jsonl
  • --radar alias for --format radar
  • --radar-suite <name> prefix radar names with <name>// (or set LEANBENCH_RADAR_SUITE)
  • --radar-out <path> or RADAR_JSONL=/path for JSONL compatible with https://github.com/leanprover/radar
  • --json-out <path> write JSON to file
  • --jsonl-out <path> write JSONL to file
  • --artifacts <dir> write outputs under this directory
  • --save <path> write JSON to file and set format to json
  • --compare <path> compare against a JSON baseline
  • --fail-on-regressions exit non-zero when regressions exceed thresholds
  • --regress-abs-ns <n> fail if mean_ns regression exceeds N nanoseconds
  • --regress-ratio <x> fail if mean_ns ratio exceeds this value (e.g., 1.05)
  • --regress-pct <p> fail if mean_ns regression exceeds this percent (e.g., 5)

Config file

LeanBench can read BenchConfig overrides from a TOML file in the current directory:

  • leanbench.toml (preferred)
  • observatory.toml (fallback, so LeanObserve + LeanBench can share profiles)

Use:

  • --config <path> to point at a specific file
  • --profile <name> to select a profile (default: default)
  • --where <expr> to filter benches using a DSL
  • --print-config (or LEANBENCH_PRINT_CONFIG=1) to show the resolved config

Config documentation: docs/leanbench-config.md.

Example config: leanbench.toml.example.

JSON output

--format json emits a schema-versioned object with schema_version, run, and results. The schema lives at docs/leanbench-schema.json. --format jsonl emits one JSON object per benchmark result (including run metadata).

Each bench result may include:

  • trace and/or trace_path (structured trace JSON)
  • sample_extras_path (per-sample JSON written under --artifacts)

By default (no --artifacts), traces are inlined as trace. When --artifacts <dir> is set, trace_path is set and trace is null.

When --artifacts <dir> is set, trace and per-sample outputs are written under:

  • <dir>/trace/*.json
  • <dir>/sample_extras/*.json

Artifact file formats are documented in docs/leanbench-artifacts.md.

Baselines used by --compare can be either the legacy JSON array or the new schema object.

Radar integration

If you use github.com/leanprover/radar, you can point a bench script at scripts/radar/leanbench.sh. It follows the radar bench script contract ($1 repo path, $2 JSONL output path) and forwards any extra args to lake exe leanbench.

# Example: run a suite with 5 samples and emit radar JSONL.
LEANBENCH_ARGS="--suite core --samples 5" \
  ./scripts/radar/leanbench.sh /path/to/LeanBench /tmp/radar.jsonl

Example radar configs live in radar/ (copy .example.yaml files and edit tokens/URLs).

Observe benches

The observe suite runs small end-to-end LeanObserve passes over benchdata/. These are intended for perf regression tracking of infotree and command-node collection.

lake exe leanbench --suite observe

The observe benches import LeanObserve directly (no subprocess), so they exercise the in-process API as well as the CLI.

LeanObserve CLI:

  • --command-nodes includes all commands (including non-decls).
  • --decl-nodes includes only named declarations (implies --command-nodes).
  • leanobserve reads observatory.toml or leanbench.toml from the current directory.
  • Select a profile with --profile or LEANOBSERVATORY_PROFILE (default: default).
  • Apply tool overrides with --tool or LEANOBSERVATORY_TOOL.
  • Use --config to point at a specific config file and --print-config to show the resolved config.
  • Precedence: CLI > env vars > tool overrides > profile > defaults.
  • Example config: observatory.toml.example.

Suites

Attach a suite directly in BenchConfig, or define a suite driver that collects benches by tag or name:

import LeanBench

bench_suite "text" ({
  description := "string + json benches"
  tags := ["string", "json"]
})

GPU metrics

Provide per-iteration byte and flop counts in BenchConfig to emit bandwidth (GB/s) and throughput (GFLOP/s) in pretty/JSON output:

import LeanBench

def cfg : BenchConfig := {
  threads := 8                -- run in parallel
  bytes := some (3 * 1_000_000 * 4)  -- bytes per iteration
  flops := some 1_000_000           -- flops per iteration
}

bench "vector_add" (cfg) do
  -- run kernel
  pure ()

Item throughput

Provide per-iteration item counts to emit items/s:

import LeanBench

def cfg : BenchConfig := {
  items := some 1024 -- items per iteration
}

bench "batch_next" (cfg) do
  -- run one batch
  pure ()

Extras

Attach additional JSON metrics (e.g., stage timing) with bench_report:

import LeanBench
import Lean.Data.Json

def reportStats : IO Lean.Json := do
  pure <| Lean.Json.mkObj [("stage_wait_ns", Lean.Json.num 12345)]

bench_report "pipeline" (reportStats) do
  -- run one pipeline step
  pure ()

Lifecycle hooks + per-sample reporting

For finer control, Bench supports lifecycle hooks that run outside the timed region:

  • setup? / teardown? run once per bench run
  • beforeEach? / afterEach? run around each sample

afterEach? and teardown? run in finally blocks (best-effort cleanup), so they will be executed even if the benchmark action throws.

You can also emit per-sample JSON via reportSample?. This hook is only executed when --artifacts <dir> is set; the collected output is written to an artifacts file and referenced via sample_extras_path in JSON output.

import LeanBench
import Lean.Data.Json

open LeanBench

initialize counterRef : IO.Ref Nat <- IO.mkRef 0

def sampleReport (ctx : SampleCtx) : IO Lean.Json := do
  let n ← counterRef.get
  pure <| Lean.Json.mkObj [
    ("sample_index", Lean.Json.num ctx.sampleIndex),
    ("elapsed_ns", Lean.Json.num ctx.elapsedNs),
    ("counter", Lean.Json.num n)
  ]

initialize LeanBench.register {
  name := "hooked"
  config := { suite := some "core", samples := 5, warmup := 1 }
  setup? := some (counterRef.set 0)
  beforeEach? := some (counterRef.modify (· + 1))
  action := pure ()
  reportSample? := some sampleReport
}

Lake integration

Lake already supports bench drivers (like test). Configure your package to use LeanBench as the driver:

benchDriver = "LeanBench/leanbench"

Then run:

lake bench -- --tags array

External commands

Use benchCmd to benchmark external commands:

bench "lean --version" do
  LeanBench.benchCmd "lean" #["--version"]

Comparison example

# create baseline
lake exe leanbench --save baseline.json

# compare current run to baseline
lake exe leanbench --compare baseline.json

License

Apache-2.0