Hesper

Write GPU programs in Lean 4. Prove them correct. Run on WebGPU or CUDA.

CI

[!IMPORTANT] Alpha software. APIs, verification features, and the compiler are under active development and subject to breaking changes. Current focus: BitNet, Gemma 4, and the verified-AD layer.

Hesper is a verified GPU programming framework. You write type-safe shaders and high-level tensor code in Lean 4, prove correctness with the same language, and run the result on WebGPU (Metal / Vulkan / D3D12) or CUDA PTX.

import Hesper.WGSL.DSL

-- Type-safe shader expressions, checked at Lean elaboration time:
let x : Exp (.scalar .f32) := var "x"
let y : Exp (.scalar .f32) := var "y"
let r := sqrt (x * x + y * y)              -- generates: sqrt(x*x + y*y)

-- let wrong := x + (var "i" : Exp (.scalar .i32))  -- ✗ type error

The Hesper Way

LayerWhat it gives you
DSLType-safe WGSL/ShaderM expressions — wrong-type mixes are Lean errors
Verified ADReverse-mode autodiff with Differentiable typeclass; gradients are proven correct
Multi-backendSame DSL lowers to WGSL (Dawn) or PTX (CUDA driver) — pick at compile time
Kernel fusionCircuit DSL fuses pointwise + reduce + matmul into one dispatch
Verified opsVerifiedOpFusion proves fused kernels equal the unfused spec

IP Catalog

Model / IPBackendStatusEntry
BitNet b1.58 (2 B)WebGPU✅ 125 TPS on M4 Maxlake exe bitnet-complete
Gemma 4 (E4B)CUDA PTX✅ ~100 TPS on RTX 4070 Tilake exe gemma4-cuda
LoRA fine-tuningWebGPU✅ Alpaca-style instruction tuninglake exe lora-train
Verified ADWebGPU✅ 11 executable parity testslake exe ad-demo
Tetris (graphics)WebGPU✅ Interactivelake exe tetris

See docs/CHANGELOG.md for the release timeline.

Quick Start

A pre-built image with Lean 4, xeus-lean, Jupyter Lab, and all 12 tutorial chapters as runnable notebooks:

docker run --rm -p 8888:8888 ghcr.io/verilean/hesper-tutorial:latest
# → open http://localhost:8888 in your browser

Option 2 — Local build

git clone https://github.com/Verilean/hesper.git
cd hesper
lake build Hesper             # native deps (Dawn, Highway) build on demand
lake exe bitnet-complete      # first inference run

Prerequisites: Lean 4 (via elan), a C++17 toolchain, CMake ≥ 3.16. For Gemma 4 on CUDA: NVIDIA driver + CUDA Toolkit. Full setup instructions: docs/tutorial/md/Ch00_Setup.md.

Documentation

The tutorial is the canonical entry point. Each chapter is a Markdown master that's also rendered as a runnable Jupyter notebook via xeus-lean.

#ChapterSource
00SetupCh00_Setup.md
01Lean 4 for ML EngineersCh01_LeanForMl.md
01bYour First Hesper ProjectCh01b_YourFirstProject.md
02The Shader DSL — WGSL + ShaderMCh02_DSL.md
03Automatic Differentiation & Verified OpsCh03_AD.md
04High-Level API & TensorsCh04_HighLevelApi.md
05Switching Backends — WebGPU / CUDACh05_Backends.md
06Proofs — Equivalence & InvariantsCh06_Proofs.md
07BitNet End-to-EndCh07_BitNet.md
08Gemma 4 End-to-EndCh08_Gemma4.md
09Embedding Hesper in Other ProjectsCh09_Embedding.md
10Hesper ArchitectureCh10_Architecture.md

Other user-facing docs:

How It Works

┌──────────────────────────────────────────────────────────────┐
│  Lean 4 code                                                  │
│  ─ Type-safe shader DSL  ─ Tensor ops  ─ Formal proofs        │
└────────────────────────────┬──────────────────────────────────┘
                             ▼
┌──────────────────────────────────────────────────────────────┐
│  ShaderM IR  (a small imperative shader language in Lean)     │
└──────────────┬────────────────────────────┬───────────────────┘
               ▼                            ▼
┌─────────────────────────┐    ┌──────────────────────────────┐
│  WGSL backend           │    │  CUDA PTX backend             │
│  (Dawn native)          │    │  (libcuda driver API)         │
└──────────┬──────────────┘    └─────────────┬─────────────────┘
           ▼                                 ▼
   Metal / Vulkan / D3D12               NVIDIA GPU

The same DSL emits two surface languages. Pick the backend at compile time (-Kgpu=cuda for CUDA exes, default WebGPU otherwise).

Examples

lake exe bitnet-complete             # BitNet b1.58 inference end-to-end
lake exe gemma4-cuda data/...gguf "Hello"   # Gemma 4 on CUDA
lake exe tetris                      # Interactive WebGPU game
lake exe dsl-basics                  # Walk through the type-safe DSL
lake exe matmul-simple               # Smallest GPU matmul

The Examples/ directory has ~90 runnable demos organized by topic (DSL/, Compute/, Graphics/, MachineLearning/, …).

Project Structure

Hesper/
├── Hesper/             # Core library (DSL, WebGPU, CUDA, Layers, Models)
├── Examples/           # ~90 runnable demos
├── Tests/              # LSpec-based test suites
├── native/             # C++ FFI bridge (Dawn + CUDA driver shim)
├── c_src/              # Portable SIMD (Google Highway)
├── docs/
│   ├── tutorial/      # User-facing tutorial (Markdown master)
│   └── research/      # Internal R&D archive
└── lakefile.lean       # Lake build configuration

Testing

lake build test-all && ./.lake/build/bin/test-all

The same suite runs in CI across Linux + Vulkan, macOS + Metal, Windows + D3D12, and a CPU-only fallback path.

Contributing

See CONTRIBUTING.md. In short: fork, branch, run the test suite, open a PR. Bug reports and design discussions go to GitHub Issues / Discussions.

License

Apache-2.0. See LICENSE.

Acknowledgments