LeanAide
LeanAide or LeanAIde (accidental pun) is work in progress to build AI based tools to help development with the Lean Theorem Prover. The core technique we use is Autoformalization which is the process of translating informal mathematical statements to formal statements in a proof assistant. This is used both directly to provide tools and indirectly to generate proofs by translation from natural language proofs generated by Large Language models.
We now (as of early February 2025) have a convenient way to use LeanAide in your own projects, using a server-client setup which we outline below.
Example usage
The most convenient way to use LeanAide is with syntax we provide that gives code-actions. We have syntax for translating theorems and definitions from natural language to Lean, and for adding documentation strings to theorems and definitions. For example, the following code in a Lean file (with correct dependencies) will give code actions:
import LeanAide
import Mathlib
#theorem "There are infinitely many odd numbers"
#def "A number is defined to be cube-free if it is not divisible by the cube of a prime number"
#doc
theorem InfiniteOddNumbers : {n ∣ Odd n}.Infinite := by sorry
We also provide syntax for generating/completing proofs. For now this is slow and not of good quality. Experiments and feedback are welcome. The first of the examples below uses a command and the second uses a tactic.
#prove "The product of two successive natural numbers is even"
theorem eg₁ (n: Nat) : 2 ∣ n * (n + 1) := by
#prove
Setting up LeanAide
To either experiment directly with LeanAide
or use it via the server-client setup as a dependency in your own project, you need to first install LeanAide.
First clone the repository. Next, from the root of the repository, run the following commands to build and fetch pre-loaded embeddings:
lake exe cache get # download prebuilt mathlib binaries
lake build mathlib
lake build
lake exe fetch_embeddings
Our translation is based on GPT 4o
from OpenAI, but you can use alternative models (including local ones). Note that we also use embeddings from OpenAI, so you will need an OpenAI API key unless you set up an example server as descried below. Here, we assume you have an OpenAI API key.
To get started please configure environment variables using the following bash commands or equivalent in your system at the base of the repository (the angle brackets are not to be included in the command), and then launch VS code.
export OPENAI_API_KEY=<your-open-ai-key>
code .
Alternatively, you can create a file with path private/OPENAI_API_KEY
(relative to the root of LeanAIDE) containing your key.
After this open the folder in VS code (or equivalent) with Lean 4 and go to the file LeanCodePrompts/TranslateDemo.lean
. Any statement written using syntax
similar to #theorem "There are infinitely many primes"
will be translated into Lean code. You will see examples of this in the demo files. Once the translation is done, a Try this
hyperlink and code-action will appear. Clicking on this will add the translated code to the file.
Alternatively, you can translate a statement using the following command in the Lean REPL:
lake exe translate "There are infinitely many primes"
Using as a dependency: server-client setup
Adding LeanAide
as a dependency to your project is brittle and slow. In particular, toolchains need to match exactly. Further, there may be a collision of transitive dependencies.
We provide a server-client setup that is more robust and faster, based on the zero-dependency project LeanAideTools. This acts as a client to the server in this repository.
The simplest (and we expect the most common) way to set up LeanAide to use in your project is to run a server locally. To do this, first set up LeanAide as described above. Then run the following command from the root of the repository:
python3 leanaide_server.py
We assume you have Python 3 installed but we only use builtin packages. If you wish to use a different model supporting the OpenAI API, you can pass parameters as in the following example where we use the Mistral API. To use a local model, you can run with the OpenAI Chat API using, for example vLLM
and give a local url.
python3 leanaide_server.py --url https://api.mistral.ai/v1/chat/completions --auth_key <Mistral API key> --model "mistral-small-latest"
Next, add LeanAideTools
as a dependency to your project. This can be done by adding the following line to your lakefile.toml
file:
[[require]]
name = "LeanAideTools"
git = "https://github.com/siddhartha-gadgil/LeanAideTools.git"
rev = "main"
If you use a lakefile.lean
file, you can add the following line to the file:
require LeanAideTools from git "https://github.com/siddhartha-gadgil/LeanAideTools"@"main"
Now you can use the LeanAideTools
package in your project. The following is an example of how to use it:
import LeanAideTools
#theorem "There are infinitely many primes"
Hosting a server
The server is by default launched at http://localhost:7654
. This can be customized by setting the variables HOST
and LEANAIDE_PORT
. To use a shared server, simply set HOST
to the IP of the hosting machine. In the client, i.e., the project with LeanAideTools
as a dependency, add the configuration:
set_option leanaide.url <hosts IP address>:7654
to use the host. Further, unless you want all clients to use the hosts authentication for OpenAI or whatever is the model used, the host should not specify an authentication key (better still, start the server with --auth_key "A key that will not work"
). The client should then supply the authentication key with
set_option leanaide.authkey? "<authentication-key>"
There are many other configurations at the server and client end.
Avoiding OpenAI embeddings
For those who want a fully open-source solution, you can provide an "examples server" and set the parameter examples_url
to point to it. We will soon provide an implementation of this.
The example server should be as follows:
- A server should be running with a port to which we can POST in Json.
- The post will have three (relevant) fields:
- input: The statement of a theorem or definition.
- prompt_type: One of
docString
,description
andconcise-description
. What field to match. - n: The number of nearest embeddings to return.
- The response gives selected "nearest" examples based on docStrings from
resources/mathlib4-prompts.jsonl
or descriptions and concise-descriptions fromresources/mathlib4-descs.jsonl
- The response should be a Json array with the data from the resource files with one modification: the appropriate text should be returned as a field one of "docString", "doc" and "doc_string" (so for the nearest "description" add a "doc" field which is the description).
Contributions and details
The principal author of this repository is Siddhartha Gadgil.
The first phase of this work (roughly June 2022-October 2023) was done in collaboration with:
- Anand Rao Tadipatri
- Ayush Agrawal
- Ashvni Narayanan
- Navin Goyal
We had a lot of help from the Lean community and from collaborators at Microsoft Research. Our server is hosted with support from research credits from Google.
More recently (since about October 2024) the work has been done in collaboration with:
- Anirudh Gupta
- Vaishnavi Shirsath
- Ajay Kumar Nair
- Malhar Patel
- Sushrut Jog
This is supported by ARCNet, ART-PARK, IISc.
Our work is described in a note at the 2nd Math AI workshop and in more detail (along with related work) in a preprint.