Extra Reference

May 30, 2026 ยท View on GitHub

The reference documentation for extra and deprecated headers.

Contents

Preamble

The extra headers contain classes that are experimental or otherwise not ready to be made part of the "official" public interface.

The deprecated headers contains functionality from previous versions that was not deemed useful or replaced by something better.

Extras

The extras can be found in include/tao/pegtl/extra/.

charconv.hpp

Provides string-to-integer conversion based on std::from_chars() from the standard <charconv> header. The classes defined in this header implement both the match() function for use as rule and the apply() function for use as action.

Conversion classes for decimal and hexadecimal are provided, other bases can be selected via the appropriate template parameter.

template< typename Integral, std::uint8_t Base = 10 > struct from_chars_throws;
template< typename Integral, std::uint8_t Base = 10 > struct from_chars_nothrow;

template< typename Integral > struct from_xchars_throws : from_chars_throws< Integral, 16 > {};
template< typename Integral > struct from_xchars_nothrow : from_chars_nothrow< Integral, 16 > {};

The Base parameter is passed to std::from_chars() as fourth function argument. It can be anything between 2 and 36, inclusive.

Note

Remember that std::from_chars() does not allow a leading + and does not recognize the 0x prefix for hexadecimal numbers.

In both rules the functions accept arbitrary state arguments, and both return bool (as is mandatory for match() but optional for apply()).

  • If the first state argument is a mutable reference to Integral then it will be assigned the result of the string-to-integer conversion.
  • If the first state argument is of any other type it will be ignored -- together with any additional states.

The _throws vs. _nothrow_ suffix chooses between failing locally, i.e. returning false, when std::from_chars() returns std::errc::result_out_of_range, or failing globally, i.e. throwing an exception derived from tao::pegtl::parse_error.

The template parameter Integral can also be void, which changes the behaviour slightly. In this case

  • the first state must be a mutable reference to any integer type supported by std::from_chars(),
  • and when used as rule the apply_mode is ignored and the conversion result always assigned to the first state.
dispatch.hpp

An experimental approach to attaching actions to multiple rules with less code.

nested_exceptions.hpp

The visit_nested functions call the visitor once for every exception, including the first/outer one. The visitor is called first with the inner-most exception, the one that was thrown first from furthest down/inside the call stack, and a nesting count corresponding to the total number of exceptions minus 1. The visitor is called last with the outer-most exception, the one that was caught in the try-catch that is calling visit_nested(), and a nesting count of 0.

For example, when an exception E is caught, and E has exception F as nested exception, then visit_nested( E, V ) for some visitor V will first call V( F, 1 ) and then V( E, 0 ).

Exceptions in an inheritance hierarchy must be listed from more general to more specific in the template parameter list. For example visit_nested< std::exception, std::runtime_error >() and not vice versa.

template< typename... Exceptions, typename Visitor >
void visit_nested( Visitor&& );

template< typename... Exceptions, typename Visitor >
void visit_nested( const std::exception_ptr&, Visitor&& );

template< typename... Exceptions, typename Exception, typename Visitor >
void visit_nested( const Exception&, Visitor&& );

The following convenience functions call visit_nested() produce a flattened vector derived from the nested exceptions. The flatten_type() and flatten_base() functions return type sliced copies of the exceptions; flatten_what() calls what() on the std::exceptions and returns a vector of strings.

template< typename Exception >
[[nodiscard]] std::vector< Exception > flatten_type( const std::exception_ptr& = std::current_exception() );
template< typename Exception >
[[nodiscard]] std::vector< Exception > flatten_type( const Exception& );

[[nodiscard]] std::vector< parse_error_base > flatten_base( const std::exception_ptr& = std::current_exception() )
[[nodiscard]] std::vector< parse_error_base > flatten_base( const parse_error_base& );

[[nodiscard]] std::vector< std::string > flatten_what( const std::exception_ptr& = std::current_exception() )
[[nodiscard]] std::vector< std::string > flatten_what( const std::exception& );
parse_tree.hpp
parse_tree_to_dot.hpp

The parse tree has its own dedicated page.

raw_string.hpp

Rules for parsing Lua-style long string literals, also called raw literals because they do not support any escape sequences.

The following descriptive of these kinds of string literals is copied from the Lua documentation.

  • An "opening long bracket of level n" is defined as an opening square bracket followed by n equal signs followed by another opening square bracket. So, an opening long bracket of level 0 is written as [[, an opening long bracket of level 1 is written as [=[, and so on.
  • A "closing long bracket" is defined similarly; for instance, a closing long bracket of level 4 is written as ]====].
  • A "long literal" starts with an opening long bracket of any level and ends at the first closing long bracket of the same level. It can contain any text except a closing bracket of the same level.
  • Literals in this bracketed form can run for several lines, do not interpret any escape sequences, and ignore long brackets of any other level.
  • For convenience, when the opening long bracket is eagerly followed by a newline, the newline is not included in the string.

Unlike Lua's long literal the raw_string rule is customizable to use other characters than [, = and ] for matching the opening and closing long brackets. Also note that Lua introduced newline-specific replacements in Lua 5.2, which are not supported at the grammar level by the raw_string rule.

The raw_string rule has a sub-type called content that actions should usally be bound to instead of binding to raw_string itself. Binding to content will call the action with the matched string excluding the opening and closing long brackets.

template< char Open, char Marker, char Close, typename... Contents >
struct raw_string
   // base is implementation detail
{
   // struct content; -- inherited
};

template< char Open, char Marker, char Close >
struct raw_string< Open, Marker, Close >
   // base is implementation detail
{
   // struct content; -- inherited
};
record.hpp

Builds on dispatch.hpp to build a linear record of a parsing run. For each successful match of a selected rule the vector contains an entry with a kind of view_input to represent the matched input, and the name of the rule that matched. Successful sub-rules of a failed rule will not show up in the record.

template< typename ParseInput >
using record_input = input_with_offset< internal::text_view_alias< typename ParseInput::eol_rule, typename ParseInput::data_t, void, void > >;

template< typename ParseInput >
struct record_value
{
   using data_t = typename ParseInput::data_t;
   using position_t = typename ParseInput::error_position_t;

   record_value( const position_t&, const data_t*, const std::size_t, const std::string_view );

   record_input< ParseInput > input;
   std::string_view rule;
};

template< typename ParseInput >
using record_vector = std::vector< record_value< ParseInput > >;

template< typename... Rules >
struct record
{
   using clause = // clauses to produce the record for Rules...

   template< typename Rule >
   using action = // clauses converted to an action

   template< typename Rule,
             template< typename... > class Control = normal,
             typename ParseInput >
   [[nodiscard]] static auto parse( ParseInput&& )
   {
      // parse with action and return a record_vector< ParseInput >
   }
};
type_to_string.hpp

Functions to convert compile-time sequences of char into a std::string or std::string_view.

template< typename T >
[[nodiscard]] std::string type_to_string();

template< char... Cs >
[[nodiscard]] std::string type_to_string();

template< typename T >
[[nodiscard]] constexpr std::string_view type_to_string_view() noexcept;

template< char... Cs >
[[nodiscard]] constexpr std::string_view type_to_string_view() noexcept;

The overloads that take a single type can be used with tao::pegtl::string<> or any other type that takes a sequence of char as template parameters.

unescape.hpp

A slightly refactored version of the previous unescape header which is now moved to deprecated/unescape.hpp.

It defines individual action classes for common unescape operations as well as an action class template that attaches these actions to corresponding grammar rules.

All actions output to their first state (second argument) which must be a mutable std::string.

struct all_of_these_actions
{
   template< typename ActionInput >
   static void apply( const ActionInput& in, std::string& out );
};

unescape_append is for non-escaped characters that need to just be copied from the input unchanged by calling out.append( in.string_view() );

unescape_char<> is for single character escape sequences like \n. It takes any number of template parameters of type char and has to be attached to a rule of type (ascii::)one<> with the same number of template parameters.

During unescape the n-th template parameter of the one is mapped to the n-th template parameter of unescape_char.

unescape_byte is for bytes represented by two hex digits and must be attached to a rule that matches exactly that.

unescape_unicode<> is for hexadecimal Unicode code points, excluding surrogates. The template parameter of type std::size_t must match the fixed (!) number of hex digits the rule it is attached to matches, usually 4 or 8.

json_unescape_unicode is for sequences of hexadecimal UTF-16 code points with 4 hex digits each. It unescapes each individual code point while checking for and merging consecutive surrogate pairs as required by RFC 8259.

Warning

The actions do not double-check that the matched portion of the input they are called with has the expected size and data!

The action template unescape<> is set up with the usual default

template< typename Rule > unescape< Rule > : nothing< Rule > {};

and some useful specializations for some generic rules, and some specific rules defined in include/tao/pegtl/example/escaped.hpp, that might be part of a rule for strings with escape sequences. Please check out include/tao/pegtl/extra/unescape.hpp to see which individual unescape action is attached to which rule by unescape<>.

Deprecated

Deprecated headers can still be found in include/tao/pegtl/deprecated/.

alphabet.hpp

Constants and rules for alphabetic ASCII characters.

if_then.hpp

Experimental combinators for arbitrary if-then-else rule cascades.

integer.hpp

Rules and actions for integer parsing, replaced by charconv.hpp.

rep_one_min_max.hpp

An optimized version of tao::pegtl::rep_min_max< tao::pegtl::ascii::one< ... > >.

rep_string.hpp

An optimized version of tao::pegtl::rep< tao::pegtl::ascii::string< ... > >.

unescape.hpp

An older version of include/tao/pegtl/extra/unescape.hpp.

Index


This page is part of the PEGTL and its documentation.

Copyright (c) 2014-2026 Dr. Colin Hirsch and Daniel Frey
Distributed under the Boost Software License, Version 1.0
See accompanying file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt