README.md
September 17, 2025 · View on GitHub
Regular-expression engine for C3
Efficient (non‑backtracking) Thompson NFA regular‑expression engine for C3, with submatch tracking and leftmost‑longest semantics. Ported from the classic Plan 9 implementation by Rob Pike.
Features
- Linear‑time matching (no catastrophic backtracking)
- Submatches (capture groups) with indices
- Alternation
|, concatenation, quantifiers* + ?, grouping(...) - Character classes
[abc], ranges[a-z], and negation[^...] - Anchors
^and$, dot. - UTF‑8 aware: operates on Unicode code points (runes)
- Leftmost‑longest match semantics (egrep‑style)
- Small, dependency‑light C3 library (
.c3l)
Not in scope: backreferences, look‑around, and other Perl/PCRE extensions.
Install (via c3l)
Use c3l to fetch and wire the dependency automatically.
From your project root (where project.json lives):
# Fetch a specific version tag (recommended)
c3l fetch https://github.com/konimarti/regexp9.c3l vX.Y.Z
# Or fetch the default branch
c3l fetch https://github.com/konimarti/regexp9.c3l
This updates project.json and stores the compressed library under your
dependency search path.
Tip: To update later, run
c3l update regexp9. To remove,c3l remove regexp9.
Alternative: manual installation
If you prefer not to use c3l:
- Add the library path to your project’s
project.json→dependency-search-paths. - Add the dependency name to
dependencies.
Example project.json snippet:
{
"dependency-search-paths": [
"lib",
"../third_party"
],
"dependencies": [
"regexp9"
]
}
Place the regexp9.c3l folder anywhere listed in dependency-search-paths (e.g., third_party/regexp9.c3l).
Quick start
import regexp9;
fn void main() {
// Compiled program
Reprog *prog = regexp9::compile("(ab|a)b+")!!;
// Execute with captures (up to 10 capture pairs)
Resub[10] subs;
if (regexp9::match(prog, "aabbb", subs[..])!!) {
// subs[0] = whole match, subs[1] = group 1, etc.
// Each Resub can return its match with subs[i].match().
}
}
String replacement (if you need it):
// Given a successful regexec, produce a substituted string using \1, \2, ...
String s = regexp9::substitute(mem, "X-\\1-Y", subs[..]);
defer s.free(mem);
Type/func names mirror the Plan 9 API. See
regexp9.c3andregexp9_test.c3for the exact signatures in this port.
API overview
C3 API:
Reprog*? compile(Allocator allocator, ZString pattern)— Compile pattern into a programReprog*? tcompile(ZString pattern)— Compile pattern into a program that is allocated on tmembool? match(Reprog* prog, ZString text, Resub[] subs = {})— Run program on text; fillsubswith capture rangesString substitute(String src, Resub[] subs)— Expand\1,\2, … with capturesbool match_string(ZString pattern, String s)— Returns true when s matches regular expression pattern
Plan9 API:
-
Reprog*? regcomp(ZString s)— Compile pattern into a program -
int regexec(Reprog* prog, char* text, Resub* subs, int slen)— Run program on text; fillsubswith capture ranges -
void regsub(char* src, char* dst, int dlen, Resub* subs, int slen)— Expand\1,\2, … using captures -
Types:
Reprog— compiled Thompson NFA programResub— capture groups
Supported syntax (summary)
Literals: a b c … (escape metacharacters with \)
Classes: [abc] [a-z] [^...] (negation)
Dot: . (any char)
Anchors: ^ $
Grouping: ( ... )
Alternation: A|B
Quantifiers: A* A+ A? (greedy)
Tests & Benchmarks
# Run the benchmarks
c3c bench
# Run the tests
c3c test
Performance notes
- Thompson NFA executed in lock‑step (the “Pike VM”) → linear in input length.
- Two worklists per input character; ε‑closure is expanded eagerly.
- Submatch registers propagate per thread; no backtracking required to recover captures.
Roadmap / ideas
- Pre‑filter for literal prefixes to skip obvious non‑matches
- Rewrite parts into more idiomatic C3
- Additional character‑class helpers (
\d,\s,\w) as short‑hands
PRs are welcome.
Acknowledgments
Plan 9 regexp by Rob Pike; background write‑ups by Russ Cox on Thompson NFAs and submatch tracking.