LeanBench
Hyperfine-inspired benchmarking library + CLI for Lean projects (no compiler changes).
Quick start
Create a bench module and import it from your Main.lean:
import LeanBench
bench "sum" do
let mut acc := 0
for i in [0:100000] do
acc := acc + i
if acc == 0 then
IO.println ""
def main (args : List String) : IO UInt32 :=
LeanBench.runMain args
Build and run:
lake build
lake exe leanbench --samples 20 --warmup 2
Install
Add LeanBench as a Lake dependency:
[[require]]
name = "LeanBench"
git = "https://github.com/alok/LeanBench"
rev = "main"
For local development, you can use a path dependency instead:
[[require]]
name = "LeanBench"
path = "../LeanBench"
Then set the bench driver (optional, enables lake bench):
benchDriver = "LeanBench/leanbench"
CLI
--config <path>read config fromleanbench.toml/observatory.toml--profile <name>select config profile (default:default)--where <expr>filter benches with a small DSL (suite(x),tag(y),name(/re/),and/or/not)--print-configprint resolved config before running (to stderr)--listlist benches--list --format jsonmachine-readable bench list--list-tagslist tags (after filters)--list-suiteslist suites (after filters)--planemit deterministic plan JSON and exit--match <substr>filter by name substring--tags <t1,t2,...>filter by any tag--all-tagsrequire all tags in--tags--suite <name>filter by suite--samples <n>override sample count--warmup <n>override warmup count--min-time-ms <n>run until total time exceeds N ms--threads <n>run each sample with N parallel action tasks--timeout-ms <n>set timeout metadata (ms)--retries <n>retry a failing benchmark up to N times--priority <n>set priority metadata--group <name>set group metadata--threads-required <n>set threads_required metadata--seed <n>seed for deterministic shuffling--shuffleshuffle bench order deterministically--partition count:m/n|hash:m/nshard benches across workers--plan-out <path>write plan JSON to file--manifest-out <path>write run manifest JSON to file--group-by suitegroup pretty output by suite--format pretty|full|json|jsonl|radar(fulladds median/p95/p99)--jsonalias for--format json--jsonlalias for--format jsonl--radaralias for--format radar--radar-suite <name>prefix radar names with<name>//(or setLEANBENCH_RADAR_SUITE)--radar-out <path>orRADAR_JSONL=/pathfor JSONL compatible with https://github.com/leanprover/radar--json-out <path>write JSON to file--jsonl-out <path>write JSONL to file--artifacts <dir>write outputs under this directory--save <path>write JSON to file and set format to json--compare <path>compare against a JSON baseline--fail-on-regressionsexit non-zero when regressions exceed thresholds--regress-abs-ns <n>fail if mean_ns regression exceeds N nanoseconds--regress-ratio <x>fail if mean_ns ratio exceeds this value (e.g., 1.05)--regress-pct <p>fail if mean_ns regression exceeds this percent (e.g., 5)
Config file
LeanBench can read BenchConfig overrides from a TOML file in the current
directory:
leanbench.toml(preferred)observatory.toml(fallback, so LeanObserve + LeanBench can share profiles)
Use:
--config <path>to point at a specific file--profile <name>to select a profile (default:default)--where <expr>to filter benches using a DSL--print-config(orLEANBENCH_PRINT_CONFIG=1) to show the resolved config
Config documentation: docs/leanbench-config.md.
Example config: leanbench.toml.example.
JSON output
--format json emits a schema-versioned object with schema_version, run, and
results. The schema lives at docs/leanbench-schema.json. --format jsonl
emits one JSON object per benchmark result (including run metadata).
Each bench result may include:
traceand/ortrace_path(structured trace JSON)sample_extras_path(per-sample JSON written under--artifacts)
By default (no --artifacts), traces are inlined as trace. When
--artifacts <dir> is set, trace_path is set and trace is null.
When --artifacts <dir> is set, trace and per-sample outputs are written under:
<dir>/trace/*.json<dir>/sample_extras/*.json
Artifact file formats are documented in docs/leanbench-artifacts.md.
Baselines used by --compare can be either the legacy JSON array or the new
schema object.
Radar integration
If you use github.com/leanprover/radar, you can point a bench script at
scripts/radar/leanbench.sh. It follows the radar bench script contract
($1 repo path, $2 JSONL output path) and forwards any extra args to
lake exe leanbench.
# Example: run a suite with 5 samples and emit radar JSONL.
LEANBENCH_ARGS="--suite core --samples 5" \
./scripts/radar/leanbench.sh /path/to/LeanBench /tmp/radar.jsonl
Example radar configs live in radar/ (copy .example.yaml files and edit tokens/URLs).
Observe benches
The observe suite runs small end-to-end LeanObserve passes over benchdata/.
These are intended for perf regression tracking of infotree and command-node
collection.
lake exe leanbench --suite observe
The observe benches import LeanObserve directly (no subprocess), so they
exercise the in-process API as well as the CLI.
LeanObserve CLI:
--command-nodesincludes all commands (including non-decls).--decl-nodesincludes only named declarations (implies--command-nodes).leanobservereadsobservatory.tomlorleanbench.tomlfrom the current directory.- Select a profile with
--profileorLEANOBSERVATORY_PROFILE(default:default). - Apply tool overrides with
--toolorLEANOBSERVATORY_TOOL. - Use
--configto point at a specific config file and--print-configto show the resolved config. - Precedence: CLI > env vars > tool overrides > profile > defaults.
- Example config:
observatory.toml.example.
Suites
Attach a suite directly in BenchConfig, or define a suite driver that
collects benches by tag or name:
import LeanBench
bench_suite "text" ({
description := "string + json benches"
tags := ["string", "json"]
})
GPU metrics
Provide per-iteration byte and flop counts in BenchConfig to emit
bandwidth (GB/s) and throughput (GFLOP/s) in pretty/JSON output:
import LeanBench
def cfg : BenchConfig := {
threads := 8 -- run in parallel
bytes := some (3 * 1_000_000 * 4) -- bytes per iteration
flops := some 1_000_000 -- flops per iteration
}
bench "vector_add" (cfg) do
-- run kernel
pure ()
Item throughput
Provide per-iteration item counts to emit items/s:
import LeanBench
def cfg : BenchConfig := {
items := some 1024 -- items per iteration
}
bench "batch_next" (cfg) do
-- run one batch
pure ()
Extras
Attach additional JSON metrics (e.g., stage timing) with bench_report:
import LeanBench
import Lean.Data.Json
def reportStats : IO Lean.Json := do
pure <| Lean.Json.mkObj [("stage_wait_ns", Lean.Json.num 12345)]
bench_report "pipeline" (reportStats) do
-- run one pipeline step
pure ()
Lifecycle hooks + per-sample reporting
For finer control, Bench supports lifecycle hooks that run outside the timed
region:
setup?/teardown?run once per bench runbeforeEach?/afterEach?run around each sample
afterEach? and teardown? run in finally blocks (best-effort cleanup), so
they will be executed even if the benchmark action throws.
You can also emit per-sample JSON via reportSample?. This hook is only
executed when --artifacts <dir> is set; the collected output is written to an
artifacts file and referenced via sample_extras_path in JSON output.
import LeanBench
import Lean.Data.Json
open LeanBench
initialize counterRef : IO.Ref Nat <- IO.mkRef 0
def sampleReport (ctx : SampleCtx) : IO Lean.Json := do
let n ← counterRef.get
pure <| Lean.Json.mkObj [
("sample_index", Lean.Json.num ctx.sampleIndex),
("elapsed_ns", Lean.Json.num ctx.elapsedNs),
("counter", Lean.Json.num n)
]
initialize LeanBench.register {
name := "hooked"
config := { suite := some "core", samples := 5, warmup := 1 }
setup? := some (counterRef.set 0)
beforeEach? := some (counterRef.modify (· + 1))
action := pure ()
reportSample? := some sampleReport
}
Lake integration
Lake already supports bench drivers (like test). Configure your package to
use LeanBench as the driver:
benchDriver = "LeanBench/leanbench"
Then run:
lake bench -- --tags array
External commands
Use benchCmd to benchmark external commands:
bench "lean --version" do
LeanBench.benchCmd "lean" #["--version"]
Comparison example
# create baseline
lake exe leanbench --save baseline.json
# compare current run to baseline
lake exe leanbench --compare baseline.json
License
Apache-2.0