LeanBench

Hyperfine-inspired benchmarking library + CLI for Lean projects (no compiler changes).

Quick start

Create a bench module and import it from your Main.lean:

import LeanBench

bench "sum" do
  let mut acc := 0
  for i in [0:100000] do
    acc := acc + i
  if acc == 0 then
    IO.println ""

def main (args : List String) : IO UInt32 :=
  LeanBench.runMain args

Build and run:

lake build
lake exe leanbench --samples 20 --warmup 2

Install

Add LeanBench as a Lake dependency:

[[require]]
name = "LeanBench"
git = "https://github.com/alok/LeanBench"
rev = "main"

For local development, you can use a path dependency instead:

[[require]]
name = "LeanBench"
path = "../LeanBench"

Then set the bench driver (optional, enables lake bench):

benchDriver = "LeanBench/leanbench"

CLI

--config <path> read config from leanbench.toml/observatory.toml
--profile <name> select config profile (default: default)
--where <expr> filter benches with a small DSL (suite(x), tag(y), name(/re/), and/or/not)
--print-config print resolved config before running (to stderr)
--list list benches
--list --format json machine-readable bench list
--list-tags list tags (after filters)
--list-suites list suites (after filters)
--plan emit deterministic plan JSON and exit
--match <substr> filter by name substring
--tags <t1,t2,...> filter by any tag
--all-tags require all tags in --tags
--suite <name> filter by suite
--samples <n> override sample count
--warmup <n> override warmup count
--min-time-ms <n> run until total time exceeds N ms
--threads <n> run each sample with N parallel action tasks
--timeout-ms <n> set timeout metadata (ms)
--retries <n> retry a failing benchmark up to N times
--priority <n> set priority metadata
--group <name> set group metadata
--threads-required <n> set threads_required metadata
--seed <n> seed for deterministic shuffling
--shuffle shuffle bench order deterministically
--partition count:m/n|hash:m/n shard benches across workers
--plan-out <path> write plan JSON to file
--manifest-out <path> write run manifest JSON to file
--group-by suite group pretty output by suite
--format pretty|full|json|jsonl|radar (full adds median/p95/p99)
--json alias for --format json
--jsonl alias for --format jsonl
--radar alias for --format radar
--radar-suite <name> prefix radar names with <name>// (or set LEANBENCH_RADAR_SUITE)
--radar-out <path> or RADAR_JSONL=/path for JSONL compatible with https://github.com/leanprover/radar
--json-out <path> write JSON to file
--jsonl-out <path> write JSONL to file
--artifacts <dir> write outputs under this directory
--save <path> write JSON to file and set format to json
--compare <path> compare against a JSON baseline
--fail-on-regressions exit non-zero when regressions exceed thresholds
--regress-abs-ns <n> fail if mean_ns regression exceeds N nanoseconds
--regress-ratio <x> fail if mean_ns ratio exceeds this value (e.g., 1.05)
--regress-pct <p> fail if mean_ns regression exceeds this percent (e.g., 5)

Config file

LeanBench can read BenchConfig overrides from a TOML file in the current directory:

leanbench.toml (preferred)
observatory.toml (fallback, so LeanObserve + LeanBench can share profiles)

Use:

--config <path> to point at a specific file
--profile <name> to select a profile (default: default)
--where <expr> to filter benches using a DSL
--print-config (or LEANBENCH_PRINT_CONFIG=1) to show the resolved config

Config documentation: docs/leanbench-config.md.

Example config: leanbench.toml.example.

--format json emits a schema-versioned object with schema_version, run, and results. The schema lives at docs/leanbench-schema.json. --format jsonl emits one JSON object per benchmark result (including run metadata).

Each bench result may include:

trace and/or trace_path (structured trace JSON)
sample_extras_path (per-sample JSON written under --artifacts)

By default (no --artifacts), traces are inlined as trace. When --artifacts <dir> is set, trace_path is set and trace is null.

When --artifacts <dir> is set, trace and per-sample outputs are written under:

<dir>/trace/*.json
<dir>/sample_extras/*.json

Artifact file formats are documented in docs/leanbench-artifacts.md.

Baselines used by --compare can be either the legacy JSON array or the new schema object.

Radar integration

If you use github.com/leanprover/radar, you can point a bench script at scripts/radar/leanbench.sh. It follows the radar bench script contract ($1 repo path, $2 JSONL output path) and forwards any extra args to lake exe leanbench.

# Example: run a suite with 5 samples and emit radar JSONL.
LEANBENCH_ARGS="--suite core --samples 5" \
  ./scripts/radar/leanbench.sh /path/to/LeanBench /tmp/radar.jsonl

Example radar configs live in radar/ (copy .example.yaml files and edit tokens/URLs).

Observe benches

The observe suite runs small end-to-end LeanObserve passes over benchdata/. These are intended for perf regression tracking of infotree and command-node collection.

lake exe leanbench --suite observe

The observe benches import LeanObserve directly (no subprocess), so they exercise the in-process API as well as the CLI.

LeanObserve CLI:

--command-nodes includes all commands (including non-decls).
--decl-nodes includes only named declarations (implies --command-nodes).
leanobserve reads observatory.toml or leanbench.toml from the current directory.
Select a profile with --profile or LEANOBSERVATORY_PROFILE (default: default).
Apply tool overrides with --tool or LEANOBSERVATORY_TOOL.
Use --config to point at a specific config file and --print-config to show the resolved config.
Precedence: CLI > env vars > tool overrides > profile > defaults.
Example config: observatory.toml.example.

Suites

Attach a suite directly in BenchConfig, or define a suite driver that collects benches by tag or name:

import LeanBench

bench_suite "text" ({
  description := "string + json benches"
  tags := ["string", "json"]
})

GPU metrics

Provide per-iteration byte and flop counts in BenchConfig to emit bandwidth (GB/s) and throughput (GFLOP/s) in pretty/JSON output:

import LeanBench

def cfg : BenchConfig := {
  threads := 8                -- run in parallel
  bytes := some (3 * 1_000_000 * 4)  -- bytes per iteration
  flops := some 1_000_000           -- flops per iteration
}

bench "vector_add" (cfg) do
  -- run kernel
  pure ()

Item throughput

Provide per-iteration item counts to emit items/s:

import LeanBench

def cfg : BenchConfig := {
  items := some 1024 -- items per iteration
}

bench "batch_next" (cfg) do
  -- run one batch
  pure ()

Extras

Attach additional JSON metrics (e.g., stage timing) with bench_report:

import LeanBench
import Lean.Data.Json

def reportStats : IO Lean.Json := do
  pure <| Lean.Json.mkObj [("stage_wait_ns", Lean.Json.num 12345)]

bench_report "pipeline" (reportStats) do
  -- run one pipeline step
  pure ()

Lifecycle hooks + per-sample reporting

For finer control, Bench supports lifecycle hooks that run outside the timed region:

setup? / teardown? run once per bench run
beforeEach? / afterEach? run around each sample

afterEach? and teardown? run in finally blocks (best-effort cleanup), so they will be executed even if the benchmark action throws.

You can also emit per-sample JSON via reportSample?. This hook is only executed when --artifacts <dir> is set; the collected output is written to an artifacts file and referenced via sample_extras_path in JSON output.

import LeanBench
import Lean.Data.Json

open LeanBench

initialize counterRef : IO.Ref Nat <- IO.mkRef 0

def sampleReport (ctx : SampleCtx) : IO Lean.Json := do
  let n ← counterRef.get
  pure <| Lean.Json.mkObj [
    ("sample_index", Lean.Json.num ctx.sampleIndex),
    ("elapsed_ns", Lean.Json.num ctx.elapsedNs),
    ("counter", Lean.Json.num n)
  ]

initialize LeanBench.register {
  name := "hooked"
  config := { suite := some "core", samples := 5, warmup := 1 }
  setup? := some (counterRef.set 0)
  beforeEach? := some (counterRef.modify (· + 1))
  action := pure ()
  reportSample? := some sampleReport
}

Lake integration

Lake already supports bench drivers (like test). Configure your package to use LeanBench as the driver:

benchDriver = "LeanBench/leanbench"

Then run:

lake bench -- --tags array

External commands

Use benchCmd to benchmark external commands:

bench "lean --version" do
  LeanBench.benchCmd "lean" #["--version"]

Comparison example

# create baseline
lake exe leanbench --save baseline.json

# compare current run to baseline
lake exe leanbench --compare baseline.json

License

Apache-2.0

LeanBench0.3.0