Grammarinator

January 30, 2026 · View on GitHub

============= Grammarinator

ANTLRv4 grammar-based test generator

.. image:: https://img.shields.io/pypi/v/grammarinator?logo=python&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/pypi/l/grammarinator?logo=open-source-initiative&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/github/actions/workflow/status/renatahodovan/grammarinator/main.yml?branch=master&logo=github&logoColor=white :target: https://github.com/renatahodovan/grammarinator/actions .. image:: https://img.shields.io/coveralls/github/renatahodovan/grammarinator/master?logo=coveralls&logoColor=white :target: https://coveralls.io/github/renatahodovan/grammarinator .. image:: https://img.shields.io/readthedocs/grammarinator?logo=read-the-docs&logoColor=white :target: http://grammarinator.readthedocs.io/en/latest/

.. start included documentation

Grammarinator is a random test generator / fuzzer that creates test cases according to an input ANTLR_ v4 grammar. The motivation behind this grammar-based approach is to leverage the large variety of publicly available ANTLR v4 grammars_. It includes both a Python-based and a high-performance C++ backend for generation.

.. _ANTLR: http://www.antlr.org .. _ANTLR v4 grammars: https://github.com/antlr/grammars-v4 .. _trophy page: https://github.com/renatahodovan/grammarinator/wiki

+--------------------------------------------------------------------------+ | TL;DR - KEY FEATURES | +--------------------------------------------------------------------------+ | Quick overview of the most important capabilities | +==========================================================================+ | | | * Generate test cases from scratch based on ANTLR v4 grammars_ or | | mutate/recombine existing test cases after they have been parsed. | | | | * Beside blackbox test generation, supports guided fuzzing through | | native integration with libFuzzer_ and AFL++_. | | | | * The AFL++ integration also enables grammar-aware test case | | minimization via the afl-tmin utility. | | | | * Grammar-aware mutation and recombination without slowing down the | | fuzzing with parsing (using pre-parsed input seeds). | | | | * Fine-grained probabilistic generation control via inline grammar | | weights or external JSON-based weight configurations (for alternatives | | and quantifiers). | | | | * Support for inline semantic predicates in grammars to dynamically | | enable or disable grammar alternatives during generation. | | | | * Multiple size-control strategies, including maximum recursion depth| | and maximum token count limits. | | | | * Built-in caching to filter out duplicate generated inputs. | | | | * Both grammar-aware and grammar-unaware mutators, with selective | | enablement and disabling support. | | | | * Extensible serialization pipeline with custom serializers for | | formatting tree-based outputs into concrete test inputs. | | | | * Advanced customization hooks: | | | | * custom models for programmatic decision guidance | | * custom listeners for information collection during generation | | * custom transformers for post-generation tree transformations | +--------------------------------------------------------------------------+

.. _libFuzzer: https://llvm.org/docs/LibFuzzer.html .. _AFL++: https://aflplus.plus

Requirements

Python_ >= 3.10
Java_ SE >= 11 JRE or JDK (the latter is optional)

Additionally, for the C++ backend:

C++20 compiler (e.g., GCC >= 11.0, Clang >= 13.0, MSVC >= 2019)
CMake_ >= 3.10

.. _Python: https://www.python.org .. _Java: https://www.oracle.com/java/ .. _CMake: https://cmake.org

Install

To use Grammarinator in another project, it can be added to setup.cfg as an install requirement (if using setuptools_ with declarative config):

.. code-block:: ini

[options]
install_requires =
    grammarinator

To install Grammarinator manually, e.g., into a virtual environment, use pip_::

pip install grammarinator

The above approaches install the latest release of Grammarinator from PyPI_. Alternatively, for the development version, clone the project and perform a local install::

pip install .

.. _setuptools: https://github.com/pypa/setuptools .. _pip: https://pip.pypa.io .. _PyPI: https://pypi.org/

Usage

As a first step, Grammarinator takes an ANTLR v4 grammar_ and creates a test generator script in Python3 or in C++. Grammarinator supports a subset of the features of the ANTLR grammar which is introduced in the Grammar overview section of the documentation. The produced generator can be subclassed later to customize it further if needed.

Basic command-line syntax of test generator creation (Python or C++)::

grammarinator-process <grammar-file(s)> -o <output-directory> --no-actions [--language hpp]

**Notes**

*Grammarinator* uses the `ANTLR v4 grammar`_ format as its input, which
makes existing grammars (lexer and parser rules) easily reusable. However,
because of the inherently different goals of a fuzzer and a parser, inlined
code (actions and conditions, header and members blocks) are most probably
not reusable, or even preventing proper execution. For first experiments
with existing grammar files, ``grammarinator-process`` supports the
command-line option ``--no-actions``, which skips all such code blocks
during fuzzer generation. Once inlined code is tuned for fuzzing, that
option may be omitted.

.. _ANTLR v4 grammar: https://github.com/antlr/grammars-v4

Python-based Test Generation

After having generated and optionally customized a fuzzer, it can be executed by the grammarinator-generate script (or by manually instantiating it in a custom-written driver, of course).

Basic command-line syntax of grammarinator-generate::

grammarinator-generate <generator> \
  -r <start-rule> -d <max-depth> \
  -o <output-pattern> -n <number-of-tests> \
  -t <transformer1> -t <transformer2>

C++-based Test Generation

After generating the C++-based fuzzer using grammarinator-process with the --language hpp flag, it needs to be built::

python3 grammarinator-cxx/dev/build.py --clean \
    --generator <generator> \
    --includedir <include-dir> \
    --tools

Once built, the standalone generator can be run as follows::

grammarinator-cxx/build/Release/bin/grammarinator-generate-<name> \
    -r <start-rule> -d <max-depth> \
    -o <output-pattern> -n <number-of-tests>

Note: The C++ backend can also be used as a custom mutator with libFuzzer. Details about this are provided in the LibFuzzer Integration section of the documentation.

Evolutionary Generation

Beside generating test cases from scratch based on the ANTLR grammar, Grammarinator is also able to recombine existing inputs or mutate only a small portion of them. To use these additional generation approaches, a population of selected test cases has to be prepared. The preparation happens with the grammarinator-parse tool, which processes the input files with an ANTLR grammar (possibly with the same one as the generator grammar) and builds grammarinator tree representations from them (with .grt* extension). These files encode the full derivation tree of the input, and can be reused across different fuzzing strategies.

Basic command line syntax of grammarinator-parse::

grammarinator-parse -g <grammar-file(s)> -r
-o <input_file(s)>

Having a population of such .grt* files, grammarinator-generate or grammarinator-generate-<name> can make use of them with the --population CLI option. If the --population option is set (for the Python or C++ generator), then Grammarinator will choose a strategy (generation, mutation, or recombination) randomly for each new test case. If any of the strategies is unwanted, they can be disabled with the --no-generate, --no-mutate, or --no-recombine options.

**Notes**

Real-life grammars often use recursive rules to express certain patterns.
However, when using such rule(s) for generation, we can easily end up in an
unexpectedly deep call stack. With the ``--max-depth`` or ``-d`` options,
this depth - and also the size of the generated test cases - can be
controlled.

Another specialty of the ANTLR grammars is that they support so-called
hidden tokens. These rules typically describe such elements of the target
language that can be placed basically anywhere without breaking the syntax.
The most common examples are comments or whitespaces. However, when using
these grammars - which don't define explicitly where whitespace may or may
not appear in rules - to generate test cases, we have to insert the missing
spaces manually. This can be done by applying a serializer (with the ``-s``
option) to the tree representation of the output tests. A simple serializer
- that inserts a space after every unparser rule - is provided by
*Grammarinator* (``grammarinator.runtime.simple_space_serializer``).

In some cases, we may want to postprocess the output tree itself (without
serializing it). For example, to enforce some logic that cannot be
expressed by a context-free grammar. For this purpose the transformer
mechanism can be used (with the ``-t`` option). Similarly to the
serializers, it will take a tree as input, but instead of creating a string
representation, it is expected to return the modified (transformed) tree
object.

As a final thought, one must not forget that the original purpose of
grammars is the syntax-wise validation of various inputs. As a consequence,
these grammars encode syntactic expectations only and not semantic rules.
If we still want to add semantic knowledge into the generated test, then we
can inherit custom fuzzers from the generated ones and redefine methods
corresponding to lexer or parser rules in ways that encode the required
knowledge (e.g.: HTMLCustomGenerator_).

.. _HTMLCustomGenerator: examples/fuzzer/HTMLCustomGenerator.py

Working Example

The repository contains a minimal example_ to generate HTML files. To give it a try, run the processor first, then use the generator to produce test cases.

With the Python backend::

grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
  -o examples/fuzzer/

grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator \
  -r htmlDocument -d 20 \
  -o examples/tests/test_%d.html -n 100 \
  -s HTMLGenerator.html_space_serializer \
  --sys-path examples/fuzzer/

With the C++ backend::

grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
  -o examples/fuzzer/ --no-actions --language hpp

python3 grammarinator-cxx/dev/build.py --clean \
    --generator HTMLGenerator \
    --serializer HTMLSpaceSerializer \
    --include HTMLConfig.hpp \
    --includedir examples/fuzzer/ \
    --tools

grammarinator-cxx/build/Release/bin/grammarinator-generate-html \
    -r htmlDocument -d 20 \
    -o examples/tests/test_%d.html -n 100

.. _example: examples/

Compatibility

Grammarinator was tested on:

Linux (Ubuntu 16.04 ... 24.04)
OS X / macOS (10.12 ... 15.5)
Windows (Server 2012 R2 / Server version 1809 / Windows 10 / Windows Server 2022)

Citations

Background on Grammarinator is published in:

Renata Hodovan, Akos Kiss, and Tibor Gyimothy. Grammarinator: A Grammar-Based Open Source Fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating Test Case Design, Selection, and Evaluation (A-TEST 2018), pages 45-48, Lake Buena Vista, Florida, USA, November 2018. ACM. https://doi.org/10.1145/3278186.3278193
Renata Hodovan, Akos Kiss. Grammarinator Meets LibFuzzer: A Structure-Aware In-Process Approach. In Proceedings of the 20th International Conference on Software Technologies (ICSOFT 2025), pages 178-189, Bilbao, Spain, June 2025. SciTePress. Best paper award. https://doi.org/10.5220/0013571500003964

.. end included documentation

Copyright and Licensing

Licensed under the BSD 3-Clause License_.

.. _License: LICENSE.rst