Actions and States

May 30, 2026 ยท View on GitHub

A (semantic) action is a class with a static apply() or apply0() function -- and/or, for advanced use cases, a static match() function. The states are user-defined objects that are passed to all rules, actions and control functions.

Contents

Introduction

In its most simple form, a parsing run only returns whether (a portion of) the input matches the grammar. To do something useful with the information gathered during a parsing run it is necessary to attach (user-defined) actions to one or more grammar rules.

This page explains semantic actions, how they are attached to a grammar, and how they can be passed data to operate on.

Actions are functions that are called during the parsing run whenever the rule they are attached to successfully matched. When an action is thus applied, the corresponding function receives the states, an arbitrary list of (user-defined) objects, as arguments.

This is the primary role of actions and the most prevalent of their use cases. In a distinct role an action can be used to change the matching behavior of the rule they are attached to. These two roles should not be mixed. Most of this document focuses on the primary role.

Actions are implemented as static member functions called apply() or apply0() of specializations of custom class templates.

States are additional function arguments to tao::pegtl::parse() that are forwarded to all actions.

To use actions during a parsing run they first need to be implemented.

  • Define a custom action class template.
  • Specialize the action class template for every rule for which a function is to be called and
    • either implement an apply() or apply0() static member function,
    • or derive from a class that implements the desired function.

The very first step, defining a custom action class template, usually looks like this.

template< typename Rule >
struct my_action
   : tao::pegtl::nothing< Rule > {};

Instantiations of the primary template for my_action< Rule > inherit from tao::pegtl::nothing< Rule > to indicate that, by default, neither my_action< Rule >::apply() nor my_action< Rule >::apply0() are to be called when Rule is successfully matched during a parsing run, i.e. that no action is to be applied to Rule.

The action class template is then specialized for the rules for which an actual action is supposed to be called. An example for a simple action for a specific state might look like this.

template<>
struct my_action< my_rule >
{
   template< typename ActionInput >
   static void apply( const ActionInput& in, my_state& s )
   {
      // ... implement
   }
};

The parsing run needs to be set up with the actions and any required states. For this, the initial action can be passed as the second template parameter, and the initial states can be passed as additional arguments to tao::pegtl::parse().

In order to manage the complexity in larger parsers and/or compose multiple grammars that each bring their own actions which in turn expect certain states, it can be useful to change the actions and/or change the states within a parsing run.

Example

Here is a very short example that shows the basic way to put together a parsing run with actions and states.

// Define a simple grammar consisting of a single rule.
struct my_grammar
   : tao::pegtl::star< tao::pegtl::any > {};

// Primary action class template.
template< typename Rule >
struct my_action
   : tao::pegtl::nothing< Rule > {};

// Specialize the action class template.
template<>
struct my_action< tao::pegtl::any >
{
   // Implement an apply() function that will be called by
   // the PEGTL every time tao::pegtl::any matches during
   // the parsing run.
   template< typename ActionInput >
   static void apply( const ActionInput& in, std::string& out )
   {
      // Get the portion of the original input that the
      // rule matched, as string, and append it to out.
      out += in.string();
   }
};

template< typename ParseInput >
[[nodiscard]] std::string as_string( ParseInput& in )
{
   // Set up the states, here a single std::string as that is
   // what our action requires as additional function argument.
   std::string out;

   // Start the parsing run with our grammar, action and state.
   tao::pegtl::parse< my_grammar, my_action >( in, out );

   // Do something with the result.
   return out;
}

All together the as_string() function is a -- convoluted and slow -- way of turning any input into a std::string.

In the following we will take a more in-depth look at states and apply() and apply0() before diving into more advanced subjects.

States

There is not much more to states, they are a list (colloquial list, not std::list) of objects that are

  • passed by the user as additional arguments to tao::pegtl::parse(), and then

  • passed by the PEGTL as additional arguments to all actions' apply() or apply0() static member functions.

The additional arguments to apply() and apply0() can be chosen freely, however all actions must accept the same list of states since they are all called with the same arguments. The states can also be manually changed during a parsing run, see below.

States are not forwarded with "perfect forwarding" since r-value references don't make much sense when they will be used as action arguments many times. The parse() function still uses universal references to bind to the state arguments in order to allow temporary objects.

Apply

As seen above, the actual functions that are called when an action is applied are static member functions named apply() of the specializations of the action class template.

template<>
struct my_action< my_rule >
{
   template< typename ActionInput >
   static void apply( const ActionInput& in, /* all the states */ )
   {
      // Called whenever matching my_rule during a parsing run
      // succeeds (and actions are not disabled). The argument
      // named 'in' represents the matched part of the input.
      // Can also return bool instead of void.
   }
}

The first argument is not the input used in the parsing run, but rather a separate object of distinct type that represents the portion of the input that the rule to which the action is attached just matched. The remaining arguments to apply() are the current state arguments.

The exact type of the input passed to apply() is tao::pegtl::action_input< ParseInput > where ParseInput is the usual type of the input of the parsing run. It is considered best practice to "template over" the type of the input as shown above.

The action input behaves like a view input in that it does not own the data it points to. Therefore the validity of the pointed-to data might not extend (much) beyond the call to apply() and an action might need to make a deep copy of the action input's data!

When the return type of an action's apply() function is bool and the return value is false then the rule match is considered a local failure even though the rule itself succeeded and returned true. For the overall parsing run, there is no difference between a rule returning false and an attached action returning false, however the action is only called after the rule returned true. When an action returns false, the library rewinds the input to where it was when the rule to which the action was attached started its successful match. This is unlike the match() static member functions that have to rewind the input themselves.

Action Input

The action input interface is similar to the basic interface of parse inputs with some important differences.

template< typename ParseInput >
class action_input
{
public:
   using data_t = typename ParseInput::data_t;
   using input_t = ParseInput;
   using error_position_t = typename ParseInput::error_position_t;
   using rewind_position_t = typename ParseInput::rewind_position_t;
#if defined( __cpp_exceptions )
   using parse_error_t = parse_error< error_position_t >;
#endif

   [[nodiscard]] const data_t* begin() const noexcept;
   [[nodiscard]] const data_t* current( const std::size_t offset = 0 ) const noexcept;
   [[nodiscard]] const data_t* end() const noexcept;

   [[nodiscard]] bool empty() const noexcept;
   [[nodiscard]] std::size_t size() const noexcept;

   [[nodiscard]] const ParseInput& input() const noexcept;

   [[nodiscard]] decltype( auto ) current_position() const;  // Can be expensive!
   [[nodiscard]] const rewind_position_t& rewind_position() const noexcept;

   [[nodiscard]] decltype( auto ) direct_source() const noexcept;
   [[nodiscard]] decltype( auto ) direct_position() const noexcept;

The most obvious difference is that an action input has no consume() or other function to modify it in any way.

The begin(), end(), size() and empty() functions are essentially like on a std::array or std::vector.

The input() function returns a reference to the parse input of the parsing run.

The current_position() function returns the error position of the parse input corresponding to the begin() of the action input, where the rule match started (the rule for whose action invocation the action input in question was constructed). If the parse input uses text positions with lazy tracking then each call to this function will scan the entire input data from start to where the rule match commenced!

The rewind_position() returns the parse input's rewind position for the same point as current_position() does, however the action input always has a copy of that object and simply returns a reference to it.

The direct_source() and direct_position() functions can only be called when ParseInput has the corresponding function.

In addition to the ones listed the action input implements all input convenience functions.

Apply0

In cases where the matched part of the input is not required, an action can implement a static member function called apply0() instead of apply(). What changes is that apply0() will be called without an input as first argument, i.e. only with all the states.

template<>
struct my_action< my_rule >
{
   static void apply0( /* all the states */ )
   {
      // Called whenever matching my_rule during a parsing run
      // succeeds (and actions are not disabled). Can also return
      // bool instead of void.
   }
};

Using apply0() is never necessary, it is "only" an optimisation with minor benefits at compile time, and potentially more noteworthy benefits at run time. We recommend implementing apply0() over apply() whenever both are viable.

Though an infrequently used feature, apply0() can also return bool instead of void, just like apply() and with the same implications.

Inheriting

We will use an example to show how to use existing actions via inheritance. The grammar for this example consists of a couple of simple rules.

struct plain
   : tao::pegtl::utf8::range< 0x20, 0x10FFFF > {};

struct escaped
   : tao::pegtl::one< '\'', '"', '?', '\\', 'a', 'b', 'f', 'n', 'r', 't', 'v' > {};

struct character
   : tao::pegtl::if_must_else< tao::pegtl::one< '\\' >, escaped, plain > {};

struct text
   : tao::pegtl::must< tao::pegtl::star< character >, tao::pegtl::eof > {};

Our goal is for a parsing run with the text rule to produce a copy of the input where the backslash escape sequences are replaced by the character they represent. When the plain rule matches, the bytes of the matched UTF-8-encoded code-point can be appended to the result. When the escaped rule matches, the bytes corresponding to the character represented by the escape sequence must be appended to the result. This can be achieved with appropriate specializations of my_action using some extra classes from tao/pegtl/extra/unescape.hpp.

template<>
struct my_action< plain >
   : tao::pegtl::append_all {};

template<>
struct my_action< escaped >
   : tao::pegtl::unescape_c< escaped, '\'', '"', '?', '\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v' > {};

For step three the input for the parsing run is set up as usual. In addition, the actions are passed as second template parameter, and a std::string as second argument to parse(). Here unescaped is the state that is required by the append_all and unescape_c actions; all additional arguments passed to parse() are forwarded to all actions.

std::string unescape( const std::string& escaped )
{
   std::string unescaped;
   tao::pegtl::memory_input in( result, __FUNCTION__ );
   tao::pegtl::parse< text, my_action >( in, unescaped );
   return unescaped;
}

At the end of the parsing run, the complete unescaped string can be found in the aptly named variable unescaped.

A more complete example of how to unescape strings can be found in src/pegtl/unescape.cpp.

Specializing

The rule class for which an action class template is specialized must exactly match the definition of the rule in the grammar. For example consider the following rule.

struct foo
   : tao::pegtl::plus< tao::pegtl::alpha > {};

Now an action class template can be specialized for foo, or for tao::pegtl::alpha, but not for tao::pegtl::plus< tao::pegtl::alpha >.

This is because base classes are not taken into consideration by the C++ language when choosing a specialization, which might be surprising when being used to pointer arguments to functions where conversions from pointer-to-derived to pointer-to-base are performed implicitly and silently.

So although the function called by the library to match foo is the inherited tao::pegtl::plus< tao::pegtl::alpha >::match(), the rule class is foo and the function known as foo::match(), wherefore an action needs to be specialized for foo instead of tao::pegtl::plus< tao::pegtl::alpha >.

While it is possible to specialize the action class template for tao::pegtl::alpha, it might not be a good idea since the action would be applied for all occurrences of tao::pegtl::alpha in the grammar. To circumvent this issue a new name can be given to the tao::pegtl::alpha, a name that will not be "randomly" used in other places of the grammar.

struct bar
   : tao::pegtl::alpha {};

struct foo
   : tao::pegtl::plus< bar > {};

Now an action class template can be specialized for foo and bar, but again not for tao::pegtl::plus< bar > or tao::pegtl::alpha.

More precisely, it could be specialized for the latter two rules, but wouldn't ever be called unless these rules were used elsewhere in the grammar, a different kettle of fish.

Note that this is also the reason why you should not use type aliases instead of inheritance when defining your grammars.

Changing Actions

The action class template is usually supplied to parse() or parse_nested() at the beginning of a parsing run. The action can also be changed, enabled or disabled during a parsing run.

Via Rules

The tao::pegtl::enable<> and tao::pegtl::disable<> rules behave just like seq<> but, without touching the current action, enable or disable calling of actions within their sub-rules, respectively.

The tao::pegtl::action<> rule also behaves similarly to seq<> but takes an action class template as first template parameter and, without enabling or disabling actions, uses its first template parameter as action for the sub-rules.

The following two lines effectively do the same thing, namely parse with my_grammar as top-level parsing rule without invoking actions (unless actions are enabled again somewhere else).

tao::pegtl::parse< my_grammar >( ... );
tao::pegtl::parse< tao::pegtl::disable< my_grammar >, my_action >( ... );

Similarly the following two lines both start parsing my_grammar with my_action (unless something changes somewhere else). Like any other rule, action<> can be used anywhere within a grammar.

tao::pegtl::parse< my_grammar, my_action >( ... );
tao::pegtl::parse< tao::pegtl::action< my_action, my_grammar > >( ... );

User-defined parsing rules can use action<>, enable<> and disable<> just like any other combinator rules. For example to disable actions in LISP-style comments the following rule could be used as per src/pegtl/s_expression.cpp.

struct comment
   : tao::pegtl::seq< tao::pegtl::one< '#' >, tao::pegtl::disable< cons_list > > {};

The ability to change the actions during a parsing run also allows using the same rules multiple times with different action class templates within a grammar.

Via Actions

The action classes tao::pegtl::disable_action and tao::pegtl::enable_action can be used to disable and enable actions, respectively, for any rule (and its sub-rules). For example actions can be disabled for my_rule in a parsing run using my_action as follows.

template<>
struct my_action< my_rule >
   : tao::pegtl::disable_action {};

tao::pegtl::parse< my_grammar, my_action >( ... );

Conversely tao::pegtl::change_action<> takes a new action class template as its only template parameter and changes the current action in a parsing run to it.

Note that parsing proceeds with the rule to which the action changing action is attached to "as if" the new action had been the current action all along. The new action can even perform an action change on the same rule, however care should be taken to not introduce infinite cycles of changes.

Changing States

The states, too, can be changed in ways beyond supplying them, or not, to tao::pegtl::parse() at the start of a parsing run.

Via Rules

The state rule behaves similarly to seq but uses the first template parameter as type of a new object. This new object is used replaces the current state(s) for the remainder of the implicit seq.

The new object is constructed with a const-reference to the current input of the parsing run, and all previous states, if any, as arguments. If no such constructor exists, the new object is default constructed. If the implicit seq of the sub-rules succeeds, then, by default, a member function named success() is called on this "new" object, receiving the same arguments as the constructor. At this point the input will be advanced by whatever the sub-rules have consumed in the meantime.

Please consult include/tao/pegtl/internal/state.hpp to see how the default behavior on success can be changed by overriding tao::pegtl::state<>::success() in a derived class when using that class instead.

Embedding a state change into the grammar with state<> is only recommended when some state is used by custom parsing rules.

Via Actions

The actions tao::pegtl::change_state<> and tao::pegtl::change_states<> can be used to change from the current to a new set of states while parsing the rules they are attached to.

The differences are summarised in this table; note that change_state is more similar to the legacy change_state control class that was included with the 2.z versions of the PEGTL.

Featurechange_statechange_states
Number of new statesoneany
Construction of new statesoptionally with input and old statesdefault
Success function on actionif not on new staterequired

With change_state only a single new state type can be given as template parameter, and only a single new state will be created. The constructor of the new state receives the same arguments as per tao::pegtl::state<>, the current input from the parsing run and all previous states.

A success() static member function is supplied that calls the success() member function on the new state, again with the current input from the parsing run and all previous states. The supplied success() can of course be overridden in a derived class.

With change_states, being a variadic template, any number of new state types can be given and an appropriate set of new states will be created (nearly) simultaneously. All new states are default-constructed, if something else is required the reader is encouraged to copy and modify the implementation of change_states in their project.

The user must implement a custom success() static member function that takes the current input from the parsing run, the new states, and the old states as arguments.

Note that, unlike the tao::pegtl::state<> combinator, the success functions are only called when actions are currently enabled!

Using the changing actions is again done via inheritance as shown in the following example for change_states.

template<>
struct my_action< my_rule >
   : tao::pegtl::change_states< new_state_1, new_state_2 >
{
   template< typename ParseInput >
   static void success( const ParseInput&, new_state_1&, new_state_2&, /* the previous states*/ )
   {
      // Do whatever with both the new and the old states...
   }
};

For a more complete example of how to build a generic JSON data structure with change_state and friends see src/example/json_build.cpp.

Changing Actions and States

The actions change_action_and_state<> and change_action_and_states<> combine change_action with one of the change_state<> or change_states<> actions, respectively. For change_action_and_state<> and change_action_and_states<> the new action class template is passed as first template parameter as for change_action, followed by the new state(s) as given to change_state<> and change_states<>.

Note that change_action_and_state<> and change_action_and_states<> behave like change_action<> in that they proceed to match the rule to which the changing action is attached to "as if" the new action had been the current action all along.

Match

Besides apply() and apply0(), an action class specialization can also have a match() static member function. The default control class template normal will detect the presence of a suitable match() function and call this function instead of tao::pegtl::match().

template<>
struct my_action< my_rule >
{
   template< typename Rule,
             apply_mode A,
             rewind_mode M,
             template< typename... > class Action,
             template< typename... > class Control,
             typename ParseInput,
             typename... States >
   [[nodiscard]] static bool match( ParseInput& in, States&&... st )
   {
      // This example just makes the same function call the normal control
      // would have made in the absence of this match function.

      return tao::pegtl::match< Rule, A, M, Action, Control >( in, st... );
   }
}

Implementing a custom match() for an action is considered a relatively advanced feature. All "changing" action classes mentioned in this document are implemented as actions with match(), which also explains why the PEGTL even offers this additional customization point; changing states and changing actions are frequently performed when adding actions to a non-trivial grammar, wherefore it makes sense to make these operations possible through actions, too.

The implementations of the included actions can be found in include/tao/pegtl/action/. They should be studied and understood before implementing a custom action with match().

Nothing

Letting the primary template of an action class template derive from tao::pegtl::nothing is recommended, though not strictly necessary.

When using nothing, some assertions are enabled that are usually very helpful while developing a parser.

When not using nothing, simply by never mentioning it (as base class), these assertions are disabled and it is possible for an action's apply() or apply0() implementation to be silently ignored.

In the following let a be an action template class, i.e. the instantiation of an action class template action for some rule r, or using a = action< r > for short.

We say that apply() is callable when it is the name of a static member function of a that returns either void or bool and can be called with an input and the current states.

We say that apply0() is callable when it is the name of a static member function of a that returns either void or bool and can be called with the current states.

The following assertions are always enabled.

  • There must be at most one callable apply or apply0() in a.
  • If nothing< r > is an accessible base class of a then a must not have a callable apply().
  • If nothing< r > is an accessible base class of a then a must not have a callable apply0().
  • If require_apply is an accessible base class of a then it must have a callable apply().
  • If require_apply0 is an accessible base class of a then it must have a callable apply0().

The classes require_apply and require_apply0 are also explained in the State Mismatch section.

The following assertion is only enabled when std::is_base_of_v< tao::pegtl::nothing< void >, action< void > > is true.

  • Either nothing< r > must be an accessible base class of a, or
  • maybe_nothing must be an accessible base class of a, or
  • a must have a callable apply() or apply0().

The class tao::pegtl::maybe_nothing is an accessible base class of all the changing actions explained above. This make is possible, but not necessary, to implement apply() or apply0() for actions derived from them.

Note that maybe_nothing can be combined, through multiple inheritance, with one of nothing<>, require_apply or require_apply0.

For example when a class b is derived from change_state, it also gains that class' maybe_nothing as accessible base class. At this point b is allowed to either have or not have an apply() or apply0(). By letting b also derive from one of the three mentioned classes, the maybe_nothing will be ignored and b will be checked to have or not have the functions as dictated by the respective additional base class.

Backtracking

Sometimes there can be backtracking during a parsing run which can lead to Actions being called in places where their effects are undesired. While it might be intuitively clear what backtracking is, for the purpose of the following discussion we give a slightly more formal definition.

We speak of backtracking across a rule S when there is a rule R of which S is a (direct or indirect) sub-rule and during a parsing run

  1. R returns local failure after
  2. S succeeded and its success is a requirement for the success of R and
  3. it is "still possible" for the top-level grammar rule of the parsing run to succeed.

In this case the input will have been rewound to the point at which R was attempted to match and all effects of S on the Input will have been undone, however, and this is the subject of this section, any action attached to S will have been already performed without there being an automatic "undo".

The AAC-Problem

In some cases it is easy to rewrite the grammar in a way that prevents backtracking. This simultaneously removes the issue of having to undo actions and improves parsing performance.

The prototypical case for which such a rewrite can be done is R = sor< seq< A, B >, seq< A, C > > where A, B and C are arbitrary rules.

If during a parsing run there are actions attached to A and C, and the input matches seq< A, C > but not seq< A, B >, then the action for A will be called twice before the action for C, which gives this problem its "AAC" name, given that what happens is:

  • Begin sor< seq< A, B >, seq< A, C > >
  • Begin seq< A, B >
  • Begin A
  • Success A with action called
  • Begin B
  • Failure B
  • Failure seq< A, B >
  • Begin seq< A, C >
  • Begin A at the same position as the begin A above
  • Success A with action called again on the same input
  • Begin C
  • Success C
  • Success seq< A, C >
  • Success sor< seq< A, B >, seq< A, C > >

Rewriting

In practice the structure of the rule might be more complicated than the pure AAC-problem which will make it harder to recognise the pattern. One solution is to rewrite R as R' = seq< A, sor< B, C > > where of course any action for A will be called at most once for every successful match of R'.

Manual Undo

Another solution is to undo the effects of the Action attached to A in case the encompassing seq< A, B > (or seq< A, C >) fail.

The advantage of this approach is that the implementation of the Action for A can pretend that is only called when really needed. The disadvantage is that there is no function on the Action that is called in the case of failure which requires the user to either write a custom match() function in the Action for seq< A, B > or to implement the failure() function in a custom Control class.

Manual Commit

A further solution is to let the Action for A perform its job "to the side", and only "commit" the effects to the target data structure in the Action for seq< A, B >.

For example if the Action attached to A takes the matched portion of the Input as std::string and appends it to std::vector< std::string > one could change said Action for A to only fill some temporary string in one of the States, and create an Action for seq< A, B > that, after it is called on success of that rule, appends the aforementioned temporary string to the target vector.

Looking Ahead

When everything else fails and a quick-and-dirty solution to Actions being called too often in the presence of backtracking is required and/or performance is not of prime importance it is relatively easy to solve the problem by employing the infinite look-ahead capability of PEGs.

When backtracking across S is a problem because an Action attached to S can be called when S succeeds even though there is a higher-up rule R that can still fail then simply replace R with seq< at< R >, R > in the grammar.

Remembering that at disables all Actions explains how this solves the problem; we first verify without Actions that R will indeed match at this point and only then match R again with Actions enabled.

Troubleshooting

The following lists a couple of frequently encountered Action-related problems.

Boolean Return

Actions returning bool are an advanced use case that should be used with caution. They prevent some internal optimisations, in particular when used with apply0(). They can also have hard to understand effects on the semantics of a parsing run. For example at< rule > can succeed for the same input for which rule fails when there is a bool-action attached to rule that returns false due to actions being disabled within at<>.

State Mismatch

When an action's apply() or apply0() expects different states than those present in the parsing run there will either be a possibly not very helpful compiler error, or it will compile without a call to the action, depending on whether tao::pegtl::nothing<> is used as base class of the primary action class template.

By deriving an action specialization from either tao::pegtl::require_apply or tao::pegtl::require_apply0, as appropriate, a -- potentially more helpful -- compiler error can be provoked, so when the grammar contains my_rule and the action is my_action then silently compiling without a call to apply0() is no longer possible.

template<>
struct my_action< my_rule >
  : require_apply0
{
   static void apply0( double )
   {
      // ...
   }
}

Note that deriving from require_apply or require_apply0 is optional and usually only used for troubleshooting.

Legacy Actions

There are compatibility rules that replicate the intrusive way action invocations were part of the grammar in the PEGTL 0.x versions.


This page is part of the PEGTL and its documentation.

Copyright (c) 2014-2026 Dr. Colin Hirsch and Daniel Frey
Distributed under the Boost Software License, Version 1.0
See accompanying file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt