[{"data":1,"prerenderedAt":4},["ShallowReactive",2],{"SwMpYnIO0n":3},"# premise-selection\n\nThis repository provides a cloud-based Lean premise selector, `Lean.LibrarySuggestions.Cloud.premiseSelector`.\nIt sends the current goal state and any new user-defined premises (in current file or imported)\nto a cloud server, and returns the top `k` premises recommended by the server.\n\nTo use the selector:\n\n```lean\nimport PremiseSelection\n\nset_library_suggestions open Lean.LibrarySuggestions in Cloud.premiseSelector \u003C|> sineQuaNonSelector.intersperse currentFile\n\ntheorem add_comm_nat (a b : Nat) : a + b = b + a := Nat.add_comm ..\n\nexample (a b : Nat) : a + b = b + a := by\n  premises  -- prints premises including `add_comm_nat` and `Nat.add_comm`\n```\n\nThe premise selector extends the `Lean.LibrarySuggestions` API introduced in Lean 4 core.\n\nIt is developed as part of [LeanHammer](https://github.com/JOSHCLUNE/LeanHammer), which uses the cloud-based premise selector.\n\n## Overview\n\nThe premise selection server backend runs a selector model on from Mathlib, Batteries, and Lean core.\nIt uses an encoder-only transformer to embed premises and the goal state, and retrieves\nthe top-`k` premises by cosine similarity.\n\nFor performance reasons (see below), the number of new premises that can be uploaded\nhas an upper limit set by the server (e.g. 8192).\nA warning will be issued if this limit is surpassed, and extra new premises are truncated.\nThis truncation prioritizes the new premises in the current module, and then premises in\nmore recently imported modules.\n\nBy default, the cloud premise selector `Lean.LibrarySuggestions.Cloud.premiseSelector` uses\nthe backend API hosted by us at `http://leanpremise.net`. To use a custom backend (e.g. in\nheavy use cases, machine learning training, or for private premises that you do not wish to\nupload to the cloud service), you may [set up your own server](https://github.com/hanwenzhu/lean-premise-server)\nand then specify a different URL:\n\n```lean\nset_option premiseSelection.apiBaseUrl \"http://my_api_url\"\n```\n\n## Run time\n\nThe first call to premise selection (by `hammer` or by `premises`) may be much slower\nthan subsequent calls, due to caching.\nOne should expect the first call in any file session (especially in a downstream project of Mathlib,\nsuch as FLT) to be up to 10–20 seconds.\nUnrelatedly, the first call in a downstream project after a server restart may also be much slower\n(e.g. 2 minutes) due to the time it takes to fill the server-side cache,\nbut subsequent calls in the downstream repository (by any user) will be much faster.\n\nTo optimize for run time, the cloud premise selector has three distinct layers of cache:\n\n* The first layer is on the user side: since the server embedder only takes pretty-printed representations\n  of new premises, the user side needs to pretty-print new premises, which is done once per file session,\n  and can take up to 10–20 seconds. To make this time reasonably short, there is also an upper bound\n  on the number of new premises allowed to be uploaded to the server. Since this is on the user side,\n  this extra time is needed for every new file session.\n* The second layer is on the server: the server maintains an LRU cache\n  of the embeddings of recent new premises uploaded by users.\n  (This LRU cache is used only for performance purposes, and is not tied to user identity.)\n  Since this is on the server side, the cache serves all users.\n* The third layer is during server initialization: the server pre-computes the embeddings of\n  all premises in a tagged version of Lean core, batteries, and mathlib.\n\n## Combinators\n\nThis repository also provides *premise selector combinators*:\n\n```lean\nopen Lean LibrarySuggestions\n\n/-! `orElse` combinator -/\n\n-- Tries the cloud premise selector. If it doesn't work (e.g. network error), use MePo instead.\nset_library_suggestions\n  Cloud.premiseSelector\n  \u003C|> mepoSelector (useRarity := false)\n\n/-! `interleave` combinator -/\n\n-- Retrieves `k` premises from the cloud, `k` from MePo, interleaves them by rank,\n-- and takes the top-`k` deduplicated premises.\n-- This is inspired by Isabelle Sledgehammer's MeSh.\nset_library_suggestions interleave #[\n  Cloud.premiseSelector,\n  mepoSelector (useRarity := false) (p := mepoP) (c := mepoC)\n]\n```\n\n## Testing\n\nTo test the selectors:\n\n```lean\nlake test\n```\n\n## Citation and Resources\n\nThe premise selector is developed by Thomas Zhu, Joshua Clune, Jeremy Avigad, Albert Jiang, and Sean Welleck, and described in our paper [*Premise Selection for a Lean Hammer*](https://arxiv.org/abs/2506.07477).\n\nLinks to open-source components of this project:\n\n* [Server](https://github.com/hanwenzhu/lean-premise-server)\n* [Model training](https://github.com/hanwenzhu/LeanHammer-training)\n  * [Model weights](https://huggingface.co/hanwenzhu/all-distilroberta-v1-lr2e-4-bs256-nneg3-ml-ne2) (subject to change)\n* [Data extraction](https://github.com/cmu-l3/ntp-toolkit/tree/hammer)\n  * [Extracted dataset](https://huggingface.co/datasets/l3lab/lean-premises)\n* [LeanHammer repo](https://github.com/JOSHCLUNE/LeanHammer)\n\nCitation:\n\n```bibtex\n@article{zhu2025premise,\n  title={Premise Selection for a Lean Hammer},\n  author={Zhu, Thomas and Clune, Joshua and Avigad, Jeremy and Jiang, Albert Qiaochu and Welleck, Sean},\n  journal={arXiv preprint arXiv:2506.07477},\n  year={2025}\n}\n```\n",1780241991952]