[{"data":1,"prerenderedAt":4},["ShallowReactive",2],{"vMTDRmpKyc":3},"# FLARE\n\n[![CI](https://github.com/henryrobbins/flare/actions/workflows/ci-python.yml/badge.svg)](https://github.com/henryrobbins/flare/actions/workflows/ci-python.yml)\n[![codecov](https://codecov.io/gh/henryrobbins/flare/branch/main/graph/badge.svg)](https://codecov.io/gh/henryrobbins/flare)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n> [!NOTE]\n> This monorepo hosts the dataset, packages, and experiment code accompanying\n> *[FLARE: Verifying MILP Reformulations with LLM-Based Formal Proof\n> Synthesis](https://flare.henryrobbins.com/)*.\n\n`FLARE` (Formulation-Level Automated Reformulation Evaluation) uses an\nLLM-based agent and the Lean 4 proof assistant to verify mixed-integer linear\nprogram (MILP) reformulations. `FLARE` is implemented in the `milp-flare` Python package and evaluated on the **FormulationBench** dataset using the `formulation-bench` Python package. This repository is a monorepo hosting the FormulationBench dataset, both Python packages, and all of the experimental code used to produce the paper's results.\n\n## Sub-Projects\n\n| Project | Path | Description | Links |\n| --- | --- | --- | --- |\n| **FormulationBench** | [`dataset/`](dataset/) | 20 problems, 116 MILP formulations, 96 labelled reformulation pairs. | [![Docs](https://readthedocs.org/projects/formulation-bench/badge/?version=latest)](https://formulation-bench.henryrobbins.com) |\n| **`formulation-bench`** | [`packages/formulation_bench/`](packages/formulation_bench/) | Utilities for loading and working with the FormulationBench dataset. | [![PyPI](https://img.shields.io/pypi/v/formulation-bench)](https://pypi.org/project/formulation-bench/) [![codecov](https://codecov.io/gh/henryrobbins/flare/branch/main/graph/badge.svg?flag=formulation_bench)](https://codecov.io/gh/henryrobbins/flare?flags%5B0%5D=formulation_bench) [![Docs](https://readthedocs.org/projects/formulation-bench/badge/?version=latest)](https://formulation-bench.henryrobbins.com) |\n| **`milp-flare`** | [`packages/milp_flare/`](packages/milp_flare/) | Official implementation of FLARE and FLARE-NL. | [![PyPI](https://img.shields.io/pypi/v/milp-flare)](https://pypi.org/project/milp-flare/) [![codecov](https://codecov.io/gh/henryrobbins/flare/branch/main/graph/badge.svg?flag=milp_flare)](https://codecov.io/gh/henryrobbins/flare?flags%5B0%5D=milp_flare) [![Docs](https://readthedocs.org/projects/milp-flare/badge/?version=latest)](https://milp-flare.henryrobbins.com/en/latest) |\n| **Experiments** | [`src/`](src/), [`experiments/`](experiments/), [`scripts/`](scripts/) | Paper experiment code: alternative verifiers, prompt templates, and experiment/analysis scripts. | [![codecov](https://codecov.io/gh/henryrobbins/flare/branch/main/graph/badge.svg?flag=src)](https://codecov.io/gh/henryrobbins/flare?flags%5B0%5D=src) |\n| **Paper site** | [`site/`](site/) | Astro landing page for the paper, deployed to GitHub Pages on pushes to the `site` branch. | [Live site](https://flare.henryrobbins.com/) |\n\n## Reproducing Experimental Results\n\nThe two scripts in [`experiments/`](experiments/) reproduce every quantitative result.\n\n### Setup\n\n1. Install [uv](https://docs.astral.sh/uv/), then sync the workspace:\n   ```bash\n   make install\n   ```\n2. Build the `flare-agent` Docker image (`FLARE` runs each agent in a Docker container):\n   ```bash\n   make -C packages/milp_flare build-image\n   ```\n3. Populate all necessary API keys for the LLM-based verifiers (Anthropic, OpenAI, DeepSeek). The relevant secrets go in a top-level `.env` file (see `.env.example`).\n4. Install a [Gurobi](https://www.gurobi.com/) license (required by the `execution` baseline and the dataset's `solve.py` scripts). A free [academic license](https://www.gurobi.com/academia/academic-program-and-licenses/) works.\n\nSee the `milp-flare` [installation guide](https://milp-flare.henryrobbins.com/en/latest/installation.html) for more details.\n\n### Baseline (Table 1, Table 2)\n\nRuns `execution`, `equivamap`, and `FLARE` on every reformulation pair, 3\nruns each, with results written under `runs/\u003Ctimestamp>/`:\n\n```bash\nuv run python -m experiments.baseline -c experiments/configs/baseline.yaml\n```\n\nSubsets and worker counts are overridable on the CLI:\n\n```bash\nuv run python -m experiments.baseline -c experiments/configs/baseline.yaml \\\n    --problems 1,2,3 --workers 5 --runs 3\n```\n\n### FLARE-NL Ablation Study (Table 3, Table 5)\n\nSweeps prompt variants and LLM models for `FLARE-NL`:\n\n```bash\nuv run python -m experiments.ablation -c experiments/configs/ablation.yaml\n```\n\nFor Table 5 in the Appendix, use the `ablation_p12.yaml` configuration.\n\n```bash\nuv run python -m experiments.ablation -c experiments/configs/ablation_p12.yaml\n```\n\n### FLARE Harness Evaluation (Table 6)\n\nSweeps different agent harnesses for `FLARE`:\n\n```bash\nuv run python -m experiments.baseline -c experiments/configs/baseline_flare.yaml\n```\n\n### Aggregating results\n\nPer-instance and aggregated classification metrics for any run directory:\n\n```bash\nuv run python scripts/report.py runs/\u003Ctimestamp>           # summary\nuv run python scripts/report.py runs/\u003Ctimestamp> -i        # per-instance\n```\n\nAdditional analysis scripts (cost/time plots, context analysis) live under\n[`scripts/analysis/`](scripts/analysis/).\n\n## Development\n\nSee `AGENTS.md` for development information.\n\n## Cite\n\nTODO: arXiv paper.\n\n## License\n\n[MIT](LICENSE.md)\n",1780846780287]