Grammarinator
January 30, 2026 ยท View on GitHub
============= Grammarinator
ANTLRv4 grammar-based test generator
.. image:: https://img.shields.io/pypi/v/grammarinator?logo=python&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/pypi/l/grammarinator?logo=open-source-initiative&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/github/actions/workflow/status/renatahodovan/grammarinator/main.yml?branch=master&logo=github&logoColor=white :target: https://github.com/renatahodovan/grammarinator/actions .. image:: https://img.shields.io/coveralls/github/renatahodovan/grammarinator/master?logo=coveralls&logoColor=white :target: https://coveralls.io/github/renatahodovan/grammarinator .. image:: https://img.shields.io/readthedocs/grammarinator?logo=read-the-docs&logoColor=white :target: http://grammarinator.readthedocs.io/en/latest/
.. start included documentation
Grammarinator is a random test generator / fuzzer that creates test cases
according to an input ANTLR_ v4 grammar. The motivation behind this
grammar-based approach is to leverage the large variety of publicly
available ANTLR v4 grammars_. It includes both a Python-based and a
high-performance C++ backend for generation.
.. _ANTLR: http://www.antlr.org
.. _ANTLR v4 grammars: https://github.com/antlr/grammars-v4
.. _trophy page: https://github.com/renatahodovan/grammarinator/wiki
+--------------------------------------------------------------------------+
| TL;DR - KEY FEATURES |
+--------------------------------------------------------------------------+
| Quick overview of the most important capabilities |
+==========================================================================+
| |
| * Generate test cases from scratch based on ANTLR v4 grammars_ or |
| mutate/recombine existing test cases after they have been parsed. |
| |
| * Beside blackbox test generation, supports guided fuzzing through |
| native integration with libFuzzer_ and AFL++_. |
| |
| * The AFL++ integration also enables grammar-aware test case |
| minimization via the afl-tmin utility. |
| |
| * Grammar-aware mutation and recombination without slowing down the |
| fuzzing with parsing (using pre-parsed input seeds). |
| |
| * Fine-grained probabilistic generation control via inline grammar |
| weights or external JSON-based weight configurations (for alternatives |
| and quantifiers). |
| |
| * Support for inline semantic predicates in grammars to dynamically |
| enable or disable grammar alternatives during generation. |
| |
| * Multiple size-control strategies, including maximum recursion depth|
| and maximum token count limits. |
| |
| * Built-in caching to filter out duplicate generated inputs. |
| |
| * Both grammar-aware and grammar-unaware mutators, with selective |
| enablement and disabling support. |
| |
| * Extensible serialization pipeline with custom serializers for |
| formatting tree-based outputs into concrete test inputs. |
| |
| * Advanced customization hooks: |
| |
| * custom models for programmatic decision guidance |
| * custom listeners for information collection during generation |
| * custom transformers for post-generation tree transformations |
+--------------------------------------------------------------------------+
.. _libFuzzer: https://llvm.org/docs/LibFuzzer.html .. _AFL++: https://aflplus.plus
Requirements
- Python_ >= 3.10
- Java_ SE >= 11 JRE or JDK (the latter is optional)
Additionally, for the C++ backend:
- C++20 compiler (e.g., GCC >= 11.0, Clang >= 13.0, MSVC >= 2019)
- CMake_ >= 3.10
.. _Python: https://www.python.org .. _Java: https://www.oracle.com/java/ .. _CMake: https://cmake.org
Install
To use Grammarinator in another project, it can be added to setup.cfg as
an install requirement (if using setuptools_ with declarative config):
.. code-block:: ini
[options]
install_requires =
grammarinator
To install Grammarinator manually, e.g., into a virtual environment, use pip_::
pip install grammarinator
The above approaches install the latest release of Grammarinator from PyPI_. Alternatively, for the development version, clone the project and perform a local install::
pip install .
.. _setuptools: https://github.com/pypa/setuptools .. _pip: https://pip.pypa.io .. _PyPI: https://pypi.org/
Usage
As a first step, Grammarinator takes an ANTLR v4 grammar_ and creates a
test generator script in Python3 or in C++. Grammarinator supports a subset
of the features of the ANTLR grammar which is introduced in the Grammar
overview section of the documentation. The produced generator can be subclassed
later to customize it further if needed.
Basic command-line syntax of test generator creation (Python or C++)::
grammarinator-process <grammar-file(s)> -o <output-directory> --no-actions [--language hpp]
..
**Notes**
*Grammarinator* uses the `ANTLR v4 grammar`_ format as its input, which
makes existing grammars (lexer and parser rules) easily reusable. However,
because of the inherently different goals of a fuzzer and a parser, inlined
code (actions and conditions, header and members blocks) are most probably
not reusable, or even preventing proper execution. For first experiments
with existing grammar files, ``grammarinator-process`` supports the
command-line option ``--no-actions``, which skips all such code blocks
during fuzzer generation. Once inlined code is tuned for fuzzing, that
option may be omitted.
.. _ANTLR v4 grammar: https://github.com/antlr/grammars-v4
Python-based Test Generation
After having generated and optionally customized a fuzzer, it can be executed
by the grammarinator-generate script (or by manually instantiating it in a
custom-written driver, of course).
Basic command-line syntax of grammarinator-generate::
grammarinator-generate <generator> \
-r <start-rule> -d <max-depth> \
-o <output-pattern> -n <number-of-tests> \
-t <transformer1> -t <transformer2>
C++-based Test Generation
After generating the C++-based fuzzer using grammarinator-process with the
--language hpp flag, it needs to be built::
python3 grammarinator-cxx/dev/build.py --clean \
--generator <generator> \
--includedir <include-dir> \
--tools
Once built, the standalone generator can be run as follows::
grammarinator-cxx/build/Release/bin/grammarinator-generate-<name> \
-r <start-rule> -d <max-depth> \
-o <output-pattern> -n <number-of-tests>
Note: The C++ backend can also be used as a custom mutator with libFuzzer. Details about this are provided in the LibFuzzer Integration section of the documentation.
Evolutionary Generation
Beside generating test cases from scratch based on the ANTLR grammar,
Grammarinator is also able to recombine existing inputs or mutate only a small
portion of them. To use these additional generation approaches, a population of
selected test cases has to be prepared. The preparation happens with the
grammarinator-parse tool, which processes the input files with an ANTLR
grammar (possibly with the same one as the generator grammar) and builds
grammarinator tree representations from them (with .grt* extension). These
files encode the full derivation tree of the input, and can be reused across
different fuzzing strategies.
Basic command line syntax of grammarinator-parse::
grammarinator-parse -g <grammar-file(s)> -r
-o
Having a population of such .grt* files, grammarinator-generate or
grammarinator-generate-<name> can make use of them with the
--population CLI option. If the --population option is set (for the
Python or C++ generator), then Grammarinator will choose a strategy
(generation, mutation, or recombination) randomly for each new test case.
If any of the strategies is unwanted, they can be disabled with the
--no-generate, --no-mutate, or --no-recombine options.
..
**Notes**
Real-life grammars often use recursive rules to express certain patterns.
However, when using such rule(s) for generation, we can easily end up in an
unexpectedly deep call stack. With the ``--max-depth`` or ``-d`` options,
this depth - and also the size of the generated test cases - can be
controlled.
Another specialty of the ANTLR grammars is that they support so-called
hidden tokens. These rules typically describe such elements of the target
language that can be placed basically anywhere without breaking the syntax.
The most common examples are comments or whitespaces. However, when using
these grammars - which don't define explicitly where whitespace may or may
not appear in rules - to generate test cases, we have to insert the missing
spaces manually. This can be done by applying a serializer (with the ``-s``
option) to the tree representation of the output tests. A simple serializer
- that inserts a space after every unparser rule - is provided by
*Grammarinator* (``grammarinator.runtime.simple_space_serializer``).
In some cases, we may want to postprocess the output tree itself (without
serializing it). For example, to enforce some logic that cannot be
expressed by a context-free grammar. For this purpose the transformer
mechanism can be used (with the ``-t`` option). Similarly to the
serializers, it will take a tree as input, but instead of creating a string
representation, it is expected to return the modified (transformed) tree
object.
As a final thought, one must not forget that the original purpose of
grammars is the syntax-wise validation of various inputs. As a consequence,
these grammars encode syntactic expectations only and not semantic rules.
If we still want to add semantic knowledge into the generated test, then we
can inherit custom fuzzers from the generated ones and redefine methods
corresponding to lexer or parser rules in ways that encode the required
knowledge (e.g.: HTMLCustomGenerator_).
.. _HTMLCustomGenerator: examples/fuzzer/HTMLCustomGenerator.py
Working Example
The repository contains a minimal example_ to generate HTML files. To give it a try, run the processor first, then use the generator to produce test cases.
With the Python backend::
grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
-o examples/fuzzer/
grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator \
-r htmlDocument -d 20 \
-o examples/tests/test_%d.html -n 100 \
-s HTMLGenerator.html_space_serializer \
--sys-path examples/fuzzer/
With the C++ backend::
grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
-o examples/fuzzer/ --no-actions --language hpp
python3 grammarinator-cxx/dev/build.py --clean \
--generator HTMLGenerator \
--serializer HTMLSpaceSerializer \
--include HTMLConfig.hpp \
--includedir examples/fuzzer/ \
--tools
grammarinator-cxx/build/Release/bin/grammarinator-generate-html \
-r htmlDocument -d 20 \
-o examples/tests/test_%d.html -n 100
.. _example: examples/
Compatibility
Grammarinator was tested on:
- Linux (Ubuntu 16.04 ... 24.04)
- OS X / macOS (10.12 ... 15.5)
- Windows (Server 2012 R2 / Server version 1809 / Windows 10 / Windows Server 2022)
Citations
Background on Grammarinator is published in:
- Renata Hodovan, Akos Kiss, and Tibor Gyimothy. Grammarinator: A Grammar-Based Open Source Fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating Test Case Design, Selection, and Evaluation (A-TEST 2018), pages 45-48, Lake Buena Vista, Florida, USA, November 2018. ACM. https://doi.org/10.1145/3278186.3278193
- Renata Hodovan, Akos Kiss. Grammarinator Meets LibFuzzer: A Structure-Aware In-Process Approach. In Proceedings of the 20th International Conference on Software Technologies (ICSOFT 2025), pages 178-189, Bilbao, Spain, June 2025. SciTePress. Best paper award. https://doi.org/10.5220/0013571500003964
.. end included documentation
Copyright and Licensing
Licensed under the BSD 3-Clause License_.
.. _License: LICENSE.rst