[{"data":1,"prerenderedAt":4},["ShallowReactive",2],{"89pj8fyHqc":3},"# Automated Theory Construction\n\nAutomated Theory Construction (ATC) is a Lean 4 workflow for building verified theory from a small axiom base.\nInstead of aiming at one hand-picked theorem, the system generates candidate statements, formalizes them, verifies them in Lean, and accumulates successful results into a growing derived theory.\n\n![ATC screenshot](assets/readme-screenshot.png)\n\n## Core Idea\n\n> Do not aim directly at the final theorem. Generate the surrounding structure until the theorem becomes inevitable.\n\nThe main loop works like this:\n\n1. Start from a base theory in `AutomatedTheoryConstruction/Theory.lean`.\n2. Generate local candidate statements from the current theory state.\n3. Attempt formalization and proof in Lean.\n4. Append verified results to `AutomatedTheoryConstruction/Derived.lean`.\n5. Recycle failed attempts into refined follow-up problems.\n\nThis is theory construction rather than ordinary proof search: the system expands the space of statements as it works.\n\n## Example Artifact\n\nA concrete generated Lean artifact is available here:\n\n- [`Lambek_generated_example.lean`](https://gist.github.com/tukamilano/8a02143aae1be8b986ae73ab84d4b8ac)\n\nThis gist shows the kind of output ATC accumulates: a large Lean file of generated and verified statements over the Lambek-calculus-based theory used in this repository.\n\n## Quick Start\n\nThe recommended end-to-end path is:\n\n1. Put your deep-research report under `materials/`. Gemini Deep Research is the recommended default for this step.\n2. Build the materials cache.\n3. Regenerate `AutomatedTheoryConstruction/research_agenda.md` from that report.\n4. Run the main seed -> loop -> refactor pipeline.\n\nPrerequisites:\n\n- Lean toolchain from `lean-toolchain`\n- Lake + Mathlib dependencies\n- Python\n- `uv`\n\nRun:\n\n```bash\nmake build\nmake materials-cache\nmake research-agenda REPORT_FILE=materials/your_report.md\nmake seed-loop-refactor-derived\n```\n\nThis builds the project, refreshes `data/materials_cache`, writes `AutomatedTheoryConstruction/research_agenda.md`, and runs the main loop plus whole-file refactor path on `Derived.lean`.\n\nFor subsequent iterations after the first `make seed-loop-refactor-derived`, prefer `make loop-continue-refactor-derived` so the loop/refactor cycle continues from the current runtime state instead of resetting it.\nIf you want to continue only the loop without the refactor stages, use `make loop-continue`.\nIf you only want to refresh derived `materials/` artifacts without fetch/extract, use `make materials-derive`.\nIf you want the fastest smoke path without Codex CLI, you can still run:\n\n```bash\nuv run python scripts/atc_cli.py loop \\\n  --worker-command \"uv run scripts/mock_worker.py\" \\\n  --max-iterations 1\n```\n\n## Documentation\n\nStart with the doc hub: [`docs/README.md`](docs/README.md)\n\n| If you want to... | Read |\n| --- | --- |\n| Set up the repo and do a first run | [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) |\n| Run the loop day to day | [`docs/USER_GUIDE.md`](docs/USER_GUIDE.md) |\n| Know what files are safe to edit | [`docs/REPO_MAP.md`](docs/REPO_MAP.md) |\n| Swap the Lean verification backend | [`docs/PROOF_EXECUTOR.md`](docs/PROOF_EXECUTOR.md) |\n| See implementation-oriented runtime notes | [`design/RUNTIME.md`](design/RUNTIME.md) |\n\n## Repository Shape\n\n- `AutomatedTheoryConstruction/Theory.lean`: entry point for the active base theory\n- `AutomatedTheoryConstruction/Theory/*.lean`: optional local theory modules\n- `AutomatedTheoryConstruction/Derived.lean`: accumulated verified theorems\n- `AutomatedTheoryConstruction/Scratch.lean`: temporary verification target\n- `AutomatedTheoryConstruction/research_agenda.md`: persistent guidance for problem selection\n- `materials/`: recommended place to keep organized deep-research outputs, literature summaries, source-link lists, and problem-seed notes used as optional external context\n- `prompts/research_agenda/`: templates for turning deep-research reports into strict `research_agenda.md` drafts\n- `scripts/atc_cli.py`: unified operational CLI\n\n`materials/` is the recommended home for deep research that you want the system to reuse later.\nTreat it as external research context, not as part of the core runtime state: the loop may consult it for seed generation, prioritization, and expansion, but it should not be folded into `theory_state.json`.\nAlso treat summary reports in `materials/` as potentially time-sensitive: they are useful for context, but source-link lists or primary papers should win when novelty or closest-known-result judgment matters.\n\nTo regenerate `AutomatedTheoryConstruction/research_agenda.md` from a deep-research report, use:\n\n```bash\nmake research-agenda REPORT_FILE=materials/your_report.md\n```\n\n## Refactor Pipeline\n\nThe post-loop refactor path is intentionally staged:\n\n1. Alpha-equivalent theorem dedupe on the preview copy\n2. Whole-file rewrite cleanup (`rewrite`, pass 1.5)\n3. Whole-file review-focused cleanup (`review`, pass 2.0)\n4. Copy the reviewed result back into `Derived.lean`\n5. Recheck the main Lean targets against the updated derived theory\n\nThis keeps global cleanup separate from proof search while still ending on a verified `Derived.lean`.\n\n## License\n\nThis repository is licensed under the MIT License. See `LICENSE`.\n\n## Acknowledgements\n\nThe prompting strategy for solving Lean problems was partially inspired by a private repository, `kmd710/lean4-codex-skills`.\n\nThis repository also includes material adapted from:\n\n- \u003Chttps://github.com/SnO2WMaN/provability-toy>\n- \u003Chttps://github.com/tani/mathling/tree/main>\n",1780846772068]