Creating a Regex Compiler/Parser - Research
January 18, 2025 ยท View on GitHub
Making a regex parser/compiler is not simple as it sounds, here is the overview of the steps:
- Convert the expression to Postfix notation.
While converting to postfix, you also need to handle Character Classes & Range Quantifiers. Tutorials on internet haven't done this. Read this for an insight on how to do this.
- Convert the postfix in above step to AST.
- Convert the AST to a state machine, preferably a NFA (non-deterministic finite automata)
Resources
- Regular Expression Matching Can Be Simple And Fast by Russ Cox
- Converting Regular Expressions to Postfix Notation with the Shunting-Yard Algorithm
- Postfix to NFA using Thompson algorithm
- Regex Compiler: Part 2
- Regular Expressions Based on Nondeterministic Finite Automaton
- Using NFA to evaluate regular expressions
- How to implement regular expression NFA with character ranges?
- Regex under the hood: Implementing a simple regex compiler in Go