Lean 4 Regex

A regular expression engine written in Lean 4.

This library is based on the Rust regex crate and extended for compatibility with PCRE2 (see Syntax).

Main restrictions:

  • focus on unicode data,
  • simple api and few configurations,
  • BoundedBacktracker as the single regex engine,
  • no optimizations.

Installation

Add the following dependency to lakefile.lean:

require Regex from git
  "https://github.com/bergmannjg/regex" @ "main"

Documentation

The main documentation is in Regex.lean

Example

Get captures of "∀ (n : Nat), 0 ≤ n" :

def Main : IO Unit := do
  let re := regex% r"^\p{Math}\s*.(?<=\()([a-z])[^,]+,\s*(\p{Nd})\s*(\p{Math})\s*\1$"
  let captures := Regex.captures "∀ (n : Nat), 0 ≤ n" re
  IO.println s!"{captures}"

Output is

fullMatch: '∀ (n : Nat), 0 ≤ n', 0, 22
groups: #[(some ('n', 5, 6)), (some ('0', 15, 16)), (some ('≤', 17, 20))]

Api

  • regex% : notation to build the regular expression at compile time
  • captures : searches for the first match of the regular expression

Components of regular expression:

  • \p{Math} : match all characters with the Math property
  • (?<=\() : lookbehind of char '('
  • (\p{Nd}) : capturing group of numeric characters
  • \1 : backreference to first capturing group

Test

The library is tested with the testdata of the PCRE2 library and the testdata of the Rust Regex crate.

The tests are run with

lake test

License

This project is licensed under the MIT license.

The data in Regex/Unicode/data/ is licensed under the Unicode License Agreement (LICENSE-UNICODE).

The .toml files in testdata/rust are licensed under the (LICENSE-RUST-REGEX).