[{"data":1,"prerenderedAt":4},["ShallowReactive",2],{"7M3krVuazL":3},"# ordvec-formalization\n\n[![Lean CI](https://github.com/Fieldnote-Echo/ordvec-formalization/actions/workflows/lean_action_ci.yml/badge.svg)](https://github.com/Fieldnote-Echo/ordvec-formalization/actions/workflows/lean_action_ci.yml)\n[![No sorry](https://img.shields.io/badge/no%20sorry-audited-brightgreen)](https://github.com/Fieldnote-Echo/ordvec-formalization/actions/workflows/lean_action_ci.yml)\n\nLean 4 formalization of a finite constant-weight bitmap overlap model: under an\nexplicit monotone overlap signal contract, an overlap-tail admission rule is\nBayes-optimal, and its idealized uniform-null probability is exactly\nhypergeometric.\n\nThe development now also packages supplied ordered-tail calibration with the\nBayes-optimal cutoff. This matters because the finite bitmap hypergeometric\ncalibration remains explicit, not a hidden claim about real deployment corpora.\n\nThis is a theory of **decision sufficiency through a quotient**, not\n**representation completeness**. Full observations may still be essential for forming,\ntransforming, training, calibrating, and composing semantic representations.\nThey can carry margins, near-ties, residual features, confidence, and other\nsignals that matter for tasks beyond candidate admission. The formal result\nsays only that, for a binary admission decision satisfying the stated\nstatistical contract, the decision surface can factor through an order-like\nquotient.\n\n## Why This Matters\n\nOrdVec-style candidate generation uses cheap overlap/popcount filters. This repo\nchecks the mathematical shape of that filter in a finite model:\n\n```text\nsymmetry picks overlap\nMLR / Bayes decision theory makes a threshold optimal\nthe constant-weight bitmap null calibrates that threshold event\n```\n\nThe result is deliberately task-relative. It says that if relevance evidence for\na binary admission decision factors through overlap, and is monotone in that\noverlap, then the optimal deterministic rule can be an overlap cutoff. It does\nnot say that ordinal signatures contain all semantic information, or that real\nencoders automatically satisfy the model.\n\nFor implementations, the practical takeaway is narrow: under an empirically\nvalidated monotone-overlap decision contract, candidate admission can be a\ncalibrated popcount threshold rather than an arbitrary accept/reject rule.\n\nThe quotient-search layers also make the empirical burden explicit. A single\nlossy quotient need not be injective; what matters is whether its fibers, or\nthe joint fibers of product quotients and finite families of lossy probes,\npreserve the target behavior being claimed. The product layer handles pairwise\nquery/document decisions, multi-target signatures, and score-induced ranking\ncomparisons; the windowed-observation layer formalizes target separation by\njoint probe codes, with same-joint-code label disagreements as finite\nfalsifiers.\n\n## Main Checked Result\n\n```lean\nOrdvecFormalization.exists_uniformBitmapOverlapTail_finiteBayesRisk_le_and_hypergeomTail\n```\n\nIn words: for `K`-active bitmaps, when the null is uniform over all `K`-active\ndocuments and the signal law is a finite exponential tilt by literal query\noverlap, some literal overlap-tail rule has Bayes risk no larger than any\ndeterministic admission rule on the full constant-weight bitmap space. The same\ntail event has the checked hypergeometric upper-tail probability under the\nuniform bitmap null.\n\nFor the theorem-name surface, see [`docs/theorem-map.md`](docs/theorem-map.md).\nFor the module-by-module proof path, see [`docs/proof-spine.md`](docs/proof-spine.md).\nFor a reviewer-oriented summary, see [`docs/reviewer-brief.md`](docs/reviewer-brief.md).\nFor a developer-facing worked example, see\n[`docs/rag-pipeline-guide.md`](docs/rag-pipeline-guide.md).\n\n## Scope\n\nThis repository proves:\n\n- finite deterministic Bayes-threshold optimality under explicit factorization\n  and monotonicity assumptions;\n- finite quotient-search constraints, reachable-image/kernel containment, and\n  sample-level same-bucket falsifiers;\n- finite product-quotient contracts for pairwise targets, componentwise\n  multi-target signatures, and score-induced ranking comparisons;\n- finite observation-window theorems showing that a family of lossy probes is\n  target-sufficient exactly when joint-code agreement preserves the target;\n- a group-theoretic maximal-invariant theorem explaining why bitmap overlap is\n  the natural quotient under query-preserving coordinate relabelings;\n- exact hypergeometric calibration for the idealized uniform constant-weight\n  bitmap null.\n\nIt does not prove:\n\n- real encoders satisfy the quotient, symmetry, or monotone-overlap contracts;\n- any concrete production probe family is globally target-sufficient without\n  the stated finite target-invariance evidence;\n- the textbook hypergeometric null is the deployment null for real corpora;\n- ordinal quotients are representation-complete for semantic tasks;\n- Neyman-Pearson, UMP, Karlin-Rubin, randomized-test, asymptotic, or empirical\n  calibration results.\n\n## Build\n\nPinned to Lean `v4.28.0` and Mathlib `v4.28.0`.\n\n```sh\nlake update     # first run only; fetches Mathlib\nmake build      # runs lake build --wfail\nmake verify\nmake check-doc-names\nmake audit\nmake lint\n```\n\nGitHub Actions runs the same build, verification, documentation-name guard,\naudit, and linter checks in\n[`.github/workflows/lean_action_ci.yml`](.github/workflows/lean_action_ci.yml).\nThe `--wfail` build treats Lean warnings, including `sorry`, as failures; the\nseparate audit checks Lean sources for proof-placeholder contamination.\n\n## License\n\nApache-2.0; see [`LICENSE`](LICENSE).\n",1780846774388]