STRUCTURE IN PRIME GAPS  FORMALIZED
PROJECT PROPOSAL
prepared for: https://leanprover.zulipchat.com/
prepared by: KAJANI KAUNDA
Let us lean together!
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.List.Basic
import Mathlib.Tactic.Linarith
import Mathlib.Data.Int.Defs
import Mathlib.Tactic.Basic
open List Nat
 Define the set of primes and their additive inverses
def primes_and_inverses : List ℤ :=
(List.range 100).filter Nat.Prime >.map (λ p => [↑p, ↑p]) > List.join
 Construct the Cayley table T for a finite portion
def cayley_table (n : ℕ) : List (List ℤ) :=
let primes := (List.range n).filter Nat.Prime >.map (λ p => ↑p)
let inverses := primes.map (λ p => p)
let first_row := primes
let first_column := inverses
let table := first_column.map (λ inv => first_row.map (λ p => p + inv))
first_row :: table
 Example of using the cayley_table with a finite portion
def T : List (List ℤ) := cayley_table 30
#eval T  To visualize a portion of the table
Structure in Prime Gaps  Formalized
By Kajani Kaunda, Jane Doe, John Doe, and others.
Abstract
In this open community project, we formalize using the LEAN programming language, the definitions and results presented in the paper Structure in Prime Gaps.
Introduction
Remark: In this paper, we will dispense with a lot of contextual explanations and restrict ourselves only to those which are pertinent to the formalization process.
Formalization of Mathematics.
The formalization of mathematics refers to the process of encoding mathematical definitions, theorems, and proofs in a formal language that can be processed and checked for correctness by a computer. This approach ensures that every logical step is explicitly verified, leaving no room for ambiguity or human error. This meticulous approach ensures that every logical step is explicitly defined and verified, minimizing the risk of human error in mathematical proofs. The goal is to achieve a level of rigor and precision that surpasses traditional mathematical methods, enabling the verification of complex proofs and the discovery of new mathematical insights. Formalization not only helps in validating existing results but also provides a robust framework for discovering new ones.
Proof Assistants.
Proof assistants are software tools designed to aid in the creation and verification of formal proofs. These tools provide a platform for writing proofs in a formal language and automatically verifying their correctness. They offer a rigorous environment where users can construct proofs interactively or automatically, depending on the complexity of the problem. These tools are essential in modern mathematics and computer science, allowing for the development of highly reliable software and the verification of intricate mathematical theorems. Popular proof assistants include Coq, Isabelle, and Lean. There are many others. They are used not only in pure mathematics but also in fields like computer science, where formal verification of software and hardware is crucial. They have been instrumental in formalizing significant results across various domains.
LEAN.
LEAN is a modern opensource proof assistant developed at Microsoft Research. It stands out for its userfriendly interface and powerful capabilities in both theorem proving and automated reasoning. It is designed to support the formalization of mathematics and the verification of software. LEAN supports a rich type theory known as dependent type theory, which allows for expressive and concise representations of mathematical objects and proofs. Its growing ecosystem includes a standard library of formalized mathematics and tools like the LEAN community web editor. LEAN has been adopted by many mathematicians and computer scientists for both educational purposes and advanced research projects. LEAN combines powerful automation with interactive proof development, making it accessible to both beginners and experts. LEAN 4, the latest version, offers significant improvements in performance and usability, encouraging more widespread adoption within the mathematical community.
Structure in Prime Gaps.
In this project, we aim to formalize the results presented in the article Structure in Prime Gaps using LEAN 4, the latest version of the LEAN proof assistant. By leveraging the capabilities of LEAN 4, we seek to ensure the correctness and robustness of these findings relating to the existence of infinitely many structured gaps between prime numbers. This formalization will provide a rigorous foundation for the results and contribute to the broader effort of formalizing mathematics.
The paper Structure in Prime Gaps presents two main results the first of which is the claim that there exist structured gaps between primes and the second result is basically a corollary or special case of the first. These are stated as follows:
Theorem 1: For every prime p_{α}, there exists infinitely many pairs of primes, (p_{n}, p_{n+m}), such that (p_{n+m} − p_{n}) = p_{α} − 3, where n, α ≥ 3, m ≥ 1, and p_{n} is the n^{th} prime.
Theorem 2: There exist infinitely many pairs of primes with a gap of 2.
A brief visual overview of the results presented in the article Structure in Prime Gaps
The following partial Cayley table T represents gaps between primes in which the results we are formalizing are selfevident.
Consider Table 1, it is not immediately apparent if any useful pattern can be discerned from it. However, with the highlights in Table 2, a compelling pattern emerges, one that leads directly to Theorem 1 from which Theorem 2 is implied as seen in Table 3.

Each pattern is defined and identified by the 4tuple β = (A, B, L, E) formed from the elements in the vertices of a subarray T_{i} of T. In Table 2, the first 4tuple β = (A, B, L, E) for prime 23 is β = (20, 28, 12, 20).

Every subarray TT_{i}, defines two pairs of primes. In Table 2, the First pair is (3, p_{α}) or (3, 23) and the Second pair is (((B + 3) + 0  E), (B + 3)) or (11, 31). We can denote the integers 11 and 31 in the Second pair using the variables Q_{i} and R_{i} respectively.

The First pair remains constant for all patterns related to any prime p_{α} ≥ 5. Subsequent Second pairs are unique for each subarray TT_{i} of T.

L is always congruent to 0 mod 6.
Legend for Tables:

Table 1: Represents the Cayley Table T of the Commutative Partial Groupoid structure (J, +) without immediately discernible patterns highlighted.

Table 2: Highlights patterns where the 4tuple β = (A, B, L, E) defines each pattern, with specific examples provided. The "Pattern" here is more elaborately defined in the sense that it consists of integers other than just Q_{i} and R_{i}.

Table 3: Demonstrates the implication of results derived from the patterns observed in Table 2.

Remark: We note that the Cayley Table T is a partial representation of an otherwise infinite structure. Therefore no generalization of these results to some "complete/larger" set is required.
Table 1
Table Legend
Table 2
Table 3
Definitions and results to be Formalized
The final LEAN 4 code will most likely be different from the following code snippets but at the very least it is good to know and verify that formalization is possible with the current technology stack; LEAN 4 and the Mathlib library.
We begin as follows:
Item name
Definition 1.
Formal statement
Definition 1: Define a model of prime gaps as a Cayley Table T constructed from a Cummutative Partial Groupoid (J, +) based on a subset J of Z containing the infinite set of prime numbers and their additive inverses, such that the elements of the first row of T are the primes and the elements of the first column of T are their additive inverses.
Notes
The set J is defined as follows J = (...,−p_{n+2}, −p_{n+1}, −p_{n}, p_{n}, p_{n+1}, p_{n+2}, ...). Notice that since primes are infinite then by definition the structure T is also infinite. The structures used by LEAN 4 in defining T must reflect this property.
LEAN 4 code
import Mathlib.Data.Nat.Prime.Basic
import Mathlib.Data.List.Basic
import Mathlib.Tactic.Linarith
import Mathlib.Data.Int.Defs
import Mathlib.Tactic.Basic
open List Nat
 Define the set of primes and their additive inverses
def primes_and_inverses : List ℤ :=
(List.range 100).filter Nat.Prime >.map (λ p => [↑p, ↑p]) > List.join
 Construct the Cayley table T for a finite portion
def cayley_table (n : ℕ) : List (List ℤ) :=
let primes := (List.range n).filter Nat.Prime >.map (λ p => ↑p)
let inverses := primes.map (λ p => p)
let first_row := primes
let first_column := inverses
let table := first_column.map (λ inv => first_row.map (λ p => p + inv))
first_row :: table
 Example of using the cayley_table with a finite portion
def T : List (List ℤ) := cayley_table 30
#eval T  To visualize a portion of the table
LEAN 4 code annotation
Item name
Definition 2.
Formal statement
Definition 2: Define a v x w subarray TT_{i} of T such that v, w ≥ 2.
Notes
Definition 2 defines a subarray which is later used to algebraically construct a pattern.
Example: TT_{1} = ((2, 4, 8, 10), (0, 2, 6, 8), (2, 0, 4, 6), (6, 4, 0, 2)).
LEAN 4 code
 Define a subarray TTi of T, where v x w are its dimensions
def sub_array (T : List (List ℤ)) (r c v w : ℕ) : List (List ℤ) :=
(T.drop r).take v >.map (λ row => (row.drop c).take w)
 Example: Extract a 3x3 subarray from T
def TT1 : List (List ℤ) := sub_array T 1 1 3 3
#eval TT1  Visualize the subarray
LEAN 4 code annotation
Item name
Definition 3.
Formal statement
Definition 3: Define a 4tuple β = (A, B, L, E) such that the values of its elements are the vertices of TT_{i}.
Notes
This is one of the structures used in the analysis of T.
Example: for TT_{1}, we have TT_{1}.β = (2, 10, 6, 2).
LEAN 4 code
 Define the structure for the 4tuple β
structure Beta where
A : ℤ
B : ℤ
L : ℤ
E : ℤ
 Create a 4tuple β from the vertices of a subarray TTi
def create_beta (TTi : List (List ℤ)) : Beta :=
⟨TTi.head!.head!, TTi.head!.getLast!, TTi.getLast!.head!, TTi.getLast!.getLast!⟩
  Provide a Repr instance for Beta to define how it should be displayed
 instance : Repr Beta :=
 ⟨fun β => s!"⟨{β.A}, {β.B}, {β.L}, {β.E}⟩"⟩
 Example usage: Create β for TT1
def β1 : Beta := create_beta TT1
 #eval β1  Visualize β1
#eval β1.A  Visualize β1
#eval β1.B  Visualize β1
#eval β1.L  Visualize β1
#eval β1.E  Visualize β1
LEAN 4 code annotation
Item name
Lemma 4.1. This is the first Lemma.
Formal statement
Lemma 4.1 : Prove that: Given a 4tuple β then A + E = B + L.
Notes
This is one of the results used in the subsequent proofs.
The proof is derived from the construction of T.
LEAN 4 code
lemma lemma_4_1 (m n c k : ℤ) :
let TTi_A := m + n
let TTi_B := m + (n + k)
let TTi_L := (m + c) + n
let TTi_E := (m + c) + (n + k)
TTi_B + TTi_L = TTi_A + TTi_E :=
by
 Expand and simplify the expressions
linarith
LEAN 4 code annotation
Item name
Lemma 4.2. This is the second Lemma.
Formal statement
Lema 4.2 : Prove that: All prime numbers greater than 3 can be expressed in the form 6n + 1 or 6n − 1.
(A known result).
Notes
This result is used in the subsequent proofs.
LEAN 4 code
LEAN 4 code annotation
Item name
Lemma 4.3. this is the third Lemma.
Formal statement
Lemma 4.3: Let t_{m,n} be a term in T where the indexes m and n are zero based and refer to the rows and columns in T respectively.
Prove that:
For every prime p_{α} ≥ 5, there exists a subarray TT_{i} ∈ T such that the following properties are simultaneously true;
Property 1 : TT_{i}.A + 3 ∈ {6n ± 1n ∈ N_{1}}
Property 2 : (TT_{i}.B + 3) − TT_{i}.E ∈ {6n ± 1n ∈ N_{1}}
Property 3 : TT_{i}.L ≡ 0 (mod 6)
Property 4 : TT_{i}.A = TT_{i}.E
Property 5 : TT_{i}.B + 3 ∈ {6n ± 1n ∈ N_{1}}
If and only if
for TT_{i}.A + 3 ∈ {6n − 1n ∈ N_{1}};
TT_{i}.A = 6x + 6y − 4
TT_{i}.B = 6x + 12y − 8
TT_{i}.L = 6x
TT_{i}.E = 6x + 6y − 4
for TT_{i}.A + 3 ∈ {6n + 1n ∈ N_{1}};
TT_{i}.A = 6x + 6y − 2
TT_{i}.B = 6x + 12y − 4
TT_{i}.L = 6x
TT_{i}.E = 6x + 6y − 2
where n ∈ N_{1}, x < n, y > 0, n = x + y, (TT_{i}.A + 3) = p_{α} and TT_{i}.A = t_{2,k}.
Notes
This result demonstrates the existence of a pattern. It algebraically shows that for every prime p_{α} ≥ 5, there is a pattern TT_{i} that defines a pair of integers Q_{i} and R_{i} such that R_{i} − Q_{i} = p_{α} − 3.
The key in the proof of this result is the following analysis:

We are given p_{α} as a constant quantity.

Now p_{α} can be expressed in the form 6n ± 1.

This implies that n in the expression 6n ± 1 is constant.

We then let n = x + y.

We then use combinatory analysis to heuristically determine which expressions to assign to A, B, L and E from the sets M = (6x, 6y, 4, 6x, 6y, 4) and N = (6x, 6y, 2, 6x, 6y, 2) such that the five (5) properties of Lemma 4.3 are satisfied.
But how are the sets M and N derived?
Notice that by Lemma 4.3, TT_{i}.A = TT_{i}.E and (TT_{i}.A + 3) = p_{α}. Since p_{α} is prime then it can be expressed in the forms 6n ± 1 or 6(x + y) ± 1. And by Lemma 4.1, A + E = B + L, which implies that TT_{i}.B + TT_{i}.L = TT_{i}.A + TT_{i}.E = ((6x + 6y + 4) + (6x + 6y + 4)) or ((6x + 6y + 2) + (6x + 6y + 2)) depending on which form p_{α} can be expressed.
LEAN 4 code
LEAN 4 code annotation
Item name
Lemma 4.4. This is the fourth Lemma.
Formal statement
Lemma 4.4: Let any subarray TT_{i} that satisfies Lemma 4.3 be referred to as a Prime Array.
Prove that: For every prime p_{α} ≥ 5, there are infinitely many Prime Arrays such that TT_{i}.A = p_{α} − 3.
Notes
This will show that for any prime p_{α} ≥ 5 the pattern defined by the Prime Array TT_{i} occurs infinitely often and that consequently, the integer pairs (Q_{i}, R_{i}) also occur infinitely often.
The key in the proof is to show that for any prime p_{α} ≥ 5,

p_{α} is constant.

Which implies n is constant.

But n = x + y

Now changing the value of y changes the value of x in order to maintain the equality n = x + y.

Changing the value of x changes the value of L since L = 6x.

Changing the value of L changes the value of B since B + L = A + E and A + E is a constant expression.

This implies that the value of y can be changed infinitely often and each change represents a different subarray or Prime Array TT_{i}.
LEAN 4 code
LEAN 4 code annotation
Item name
Lemma 4.5. This is the fifth Lemma.
Formal statement
Lemma 4.5: Prove that: For every prime p_{α} ≥ 5, there exists infinitely many Prime Arrays, TT_{i}, such that TT_{i}.A = p_{α} − 3 and (T_{i}.B + 3) and ((T_{i}. B + 3) −T_{i}.E) are prime.
Notes
This will show that for any prime p_{α} ≥ 5, the Prime Arrays TT_{i} occur infinitely often and Q_{i} and R_{i} are prime. Notice that here, (TT_{i}.B + 3) = R_{i} and Q_{i} = ((TT_{i}.B + 3) − TT_{i}.E).
The proof here relies on the construction of T where all the elements on the first row of T are prime
We can then algebraically show that (TT_{i}.B + 3) and ((TT_{i}.B + 3) − TT_{i}.E) are prime.
LEAN 4 code
LEAN 4 code annotation
Item name
Theorem 1. The first Theorem, the first of the two main results.
Formal statement
Theorem 1: Prove that: For every prime p_{α}, there exists infinitely many pairs of primes, (p_{n}, p_{n+m}), such that (p_{n+m} − p_{n}) = p_{α} − 3, where n, α ≥ 3, m ≥ 1, and p_{n} is the n^{th} prime.
Notes
The claim is that this result is implied from the previous results as demonstrated in the following steps:
Step 1: By Lemma 4.5, and the construction of T, TT_{i}.A and TT_{i}.E are prime gaps.
Step 2: By Lemma 4.3, TT_{i}.A = TT_{i}.E.
This is equivalent to: (TT_{i}.A + 3) − 3 = ((TT_{i}.B + 3) − ((TT_{i}.B + 3) − TT_{i}.E)).
Step 3: By Lemma 4.5, the following are prime; (TT_{i}.A + 3), ((TT_{i}.B + 3) − TT_{i}.E), (TT_{i}.B + 3).
Step 4: By Lemma 4.5, for every prime (TT_{i}.A + 3) ≥ 5, there are infinitely many of the following pairs of primes defined as: (((TT_{i}.B + 3) − TT_{i}.E), (TT_{i}.B + 3)).
We can then see that the following statement is implied:
For every prime (TT_{i}.A + 3) ≥ 5, there exists infinitely many pairs of primes, (((TT_{i}.B + 3) − TT_{i}.E), (TT_{i}.B + 3)), such that ((TT_{i}.B + 3) − ((TT_{i}.B + 3) − TT_{i}.E)) = (TT_{i}.A + 3) − 3.
This statement is equivalent to the formal statement of Theorem 1 which can be is restated using the following equivalent assignments:
p_{α} − 3 = TT_{i}.A.
p_{n+m} − 3 = TT_{i}.B.
p_{α} − p_{n} = TT_{i}.L.
p_{n+m} − p_{n} = TT_{i}.E.
LEAN 4 code
LEAN 4 code annotation
Item name
Theorem 2. This is the second Theorem, the second of the two main results.
Formal statement
Theorem 2: Prove that there exist infinitely many pairs of primes with a gap of 2.
Notes
This result is just a special case of Theorem 1 when p_{α} is set to 5. I am sure this would be resolved by LEAN 4 using the "refl" similar tactic.
LEAN 4 code
LEAN 4 code annotation
Outcome and Conclusion
By formalizing the results presented in the paper Structure in Prime Gaps, we hope to contribute to the body of knowledge in mathematics as well as help establish the use of proof assistants like LEAN in academia, research and industry in general.
Source References
Useful Links
 Zulip chat for Lean for coordination
 Blueprint
 Blueprint as pdf
 Dependency graph
 Doc pages for this repository
 Kaunda, K: Structure in Prime Gaps, (2024). (Preprint).
 Pietro Monticone: Lean Project Template for blueprintdriven formalization projects.
 Patrick Massot: LeanBlueprint, A plasTeX plugin allowing to write blueprints for Lean 4 projects.