Introduction
May 30, 2026 · View on GitHub
The Parsing Expression Grammar Template Library (PEGTL) is a zero-dependency C++ header-only parser combinator library for creating parsers according to a Parsing Expression Grammar (PEG).
Contents
Hello, World!
Since the PEGTL is a parser library our "Hello, world!" example first parses, rather than prints, the string Hello, foo!, allowing for any sequence of alphabetic ASCII characters in place of foo.
#include <string>
#include <iostream>
// Include the commonly used parts of the core
// library with a single include directive:
#include <tao/pegtl.hpp>
// The library resides in the namespace TAO_PEGTL_NAMEPACE
// which by default is a macro defined to tao::pegtl. This
// can be changed in include/tao/pegtl/config.hpp.
namespace pegtl = TAO_PEGTL_NAMESPACE;
namespace hello
{
// Parsing rule that matches a literal "Hello, ".
struct prefix
: pegtl::string< 'H', 'e', 'l', 'l', 'o', ',', ' ' >
{};
// Parsing rule that matches a non-empty sequence of
// alphabetic ascii-characters (with greedy-matching).
// As PEG or Posix regex this would be '[a-zA-Z]+'.
struct name
: pegtl::plus< pegtl::alpha >
{};
// Parsing rule that matches a sequence of the 'prefix'
// rule, the 'name' rule, a literal "!", and 'eof'
// (end-of-file/input). The 'must' makes the parser
// throw an exception when a sub-rule doesn't match
// (instead of returning 'false' and possibly doing
// some backtracking depending on the grammar).
struct grammar
: pegtl::must< prefix, name, pegtl::one< '!' >, pegtl::eof >
{};
// Class template for user-defined semantic actions;
// the non-specialized default case does nothing.
template< typename Rule >
struct action
: pegtl::nothing< Rule >
{};
// Specialisation of the user-defined action to do
// something when the 'name' rule succeeds; is called
// with the portion of the input that matched the rule,
// packaged in an instance of pegtl::action_input<>.
template<>
struct action< name >
{
template< typename ActionInput >
static void apply( const ActionInput& in, std::string& v )
{
v = in.string();
}
};
} // namespace hello
int main( int argc, char** argv )
{
if( argc < 2 ) {
return 1;
}
// Start a parsing run of argv[1] with the string
// variable 'name' as additional argument that will
// be passed to all called actions, including the
// one we attached to the 'name' rule above.
std::string name;
pegtl::argv_input in( argv, 1 );
if( !pegtl::parse< hello::grammar, hello::action >( in, name ) ) {
std::cout << "I can't parse you!" << std::endl;
return 1;
}
std::cout << "Good bye, " << name << "!" << std::endl;
return 0;
}
Assuming the current directory is the main directory of the PEGTL this source can be found in src/example/hello_world.cpp.
On Linux and Unix, including macOS, this can be compiled with a command like
$ g++ --std=c++17 -Iinclude src/example/hello_world.cpp -o hello_world
or with a call to make(1) to build all examples and tests via the included Makefile.
Once the example is built it can be invoked as follows (taking care to use single quotes as shown, and noting that the executable will reside in build/bin/example/hello_world when using make):
$ ./hello_world 'Hello, world!'
Good bye, world!
$ ./hello_world 'Hello, Colin!'
Good bye, Colin!
$ ./hello_world 'Howdy, Paula!'
I can't parse you!
Structure
The PEGTL contains both header and implementation files. The actual library is header-only and requires no compilation.
Header Files
The header files can be classified and grouped as follows.
- The core library headers that are included with
<tao/pegtl.hpp>. - The additional library headers in their respective sub-directories.
- The unofficial library headers in the
exampleandextrasub-directories. - The deprecated library headers in the
deprecatedsub-directory.
The main and additional headers form the official public API of this library and are subject to semantic versioning.
The example and extra headers are considered too niche or experimental for inclusion in the official public API. They are not subject to semantic versioning and can change at any time.
The deprecated headers are ones we expect to be rarely, if ever, used. If you are using something deprecated please either copy it to your project and or let us know before we remove it.
| Directory | Contents |
|---|---|
include/tao/pegtl.hpp | Core header "include all" |
include/tao/pegtl/action/ | Additional actions |
include/tao/pegtl/binary/ | Binary rules |
include/tao/pegtl/control/ | Additional controls |
include/tao/pegtl/debug/ | Debug facilities |
include/tao/pegtl/deprecated/ | Deprecated headers |
include/tao/pegtl/example/ | Example grammars |
include/tao/pegtl/extra/ | Extra headers |
include/tao/pegtl/stream/ | Stream parsing |
include/tao/pegtl/unicode/ | Unicode rules |
The header files in any internal/ sub-directory, and all C++ definitions and declarations in any internal sub-namespace, are private to the library.
Implementation Files
There are two kinds of implementation files, tests and examples, found in src/test/ and src/example/, respectively.
Neither is considered part of the public API wherefore neither is subject to semantic versioning.
The examples are listed in the Example Reference.
Namespaces
By default, the entire PEGTL resides in namespace tao::pegtl.
This can be changed in include/tao/pegtl/config.hpp as explained in Embedding in Libraries.
The entire PEGTL documentation assumes the default namespace.
Some parts of the library use sub-namespaces, for example the parsing rules specific to UTF-8 encoded text are in namespace tao::pegtl::utf8.
Similarly the parsing rules for ASCII text are in namespace tao::pegtl::ascii, which is an inline namespace, making them (also) accessible "as if" they were in namespace tao::pegtl.
For this reason we frequently put the ascii in brackets and write tao::pegtl::(ascii::) to designate the ASCII namespace throughout this documentation.
Everything that is considered private to the library resides in an internal namespace.
Private parts of the library are not considered part of the library interface and are therefore not subject to semantic versioning.
Macros do not respect namespaces and have a TAO_PEGTL_ prefix instead.
Parsing Expression Grammars
The PEGTL creates parsers according to a Parsing Expression Grammar (PEG). The table below shows how the classical PEG combinators, or composite parsing expressions, map to PEGTL rule class templates.
| PEG | tao::pegtl:: |
|---|---|
| &e | at< R... > (combinators) |
| !e | not_at< R... > (combinators) |
| e? | opt< R... > (combinators) |
| e+ | plus< R... > (combinators) |
| e1e2 | seq< R... > (combinators) |
| e1 / e2 | sor< R... > (combinators) |
| e* | star< R... > (combinators) |
And the next table shows how some of the more common atomic PEG expressions are expressed with PEGTL rule classes.
| PEG | tao::pegtl:: |
|---|---|
| E | eof (atomic) |
| ε | success (atomic) |
| ⊥ | failure (atomic) |
| . | any (ascii) |
| "a" | one< 'a' > (ascii) |
| "[a-h]" | range< 'a', 'h' > (ascii) |
| "[a-zA-Z]" | alpha (ascii) |
The PEGTL comes with dozens of rules for convenience and advanced features, and the possibility to implement custom rules.
Definitions
- A (parsing) rule is a class that models a (production) rule of a formal grammar, or a parser combinator.
- A grammar is a set of one or more related (parsing) rules, with one (or more) designated top-level rules as entry-point(s).
- Input data is a sequence of objects - often of type
char- that is intended to be parsed. - An input is a class that adheres to an informal input interface and represents some input data.
- A (semantic) action is a class with a static
apply()orapply0()function -- and/or, for advanced use cases, a staticmatch()function. - A control is a class that adheres to an informal control interface and is in control of important behind-the-scenes details of a parsing run.
- The states are user-defined objects that are passed to all rules, actions and control functions.
- A parsing run is everything that happens during a call to
tao::pegtl::parse()with a grammar and, optionally, an action, a control and states. - A nested parsing run similarly refers to a call to
tao::pegtl::parse_nested()during a parsing run (usually from an action). - A position is an instance of a class that indicates an object in the input data, possibly with auxiliary information like filename and line number.
- Input is consumed when the reference to what is considered the current object in the input data is advanced.
- Stream parsing refers to parsing with an input where only a small portion of input data is kept in a contiguous memory buffer.
- An action
Ais attached to a ruleRwhen the specializationA< R >has anapply(),apply0()ormatch()function. - An action is applied when its
apply()orapply0()function is called after the rule it is attached to succeeded. - Success is when a
match()function returnstrue. - Local failure is when a
match()function returnsfalse(which can lead to backtracking). - Global failure is when a
match()function throws an exception (which usually aborts the parsing run). - The matched input is the portion of the input data consumed by a parsing rule during a successful call to its
match()function. - The grammar analysis is an algorithm that checks a grammar for the occurrence of potential infinite loops stemming e.g. from left recursion.
This page is part of the PEGTL and its documentation.
Copyright (c) 2014-2026 Dr. Colin Hirsch and Daniel Frey
Distributed under the Boost Software License, Version 1.0
See accompanying file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt