PBJ Code Generation Architecture

April 7, 2026 · View on GitHub

This document describes how the PBJ compiler Gradle plugin transforms .proto schema files into Java source code.

Overview

The PBJ compiler is a Gradle plugin (com.hedera.pbj.pbj-compiler) that parses Protocol Buffer 3 schema files using an ANTLR4 grammar and generates Java source code. For each protobuf message, it produces up to five Java files: a model record, a schema class, a protobuf codec, a JSON codec, and a unit test. For enums and services, it generates a single file each.

The pipeline has three phases:

  1. Global analysis — scan all proto files (sources + classpath) to build lookup tables for packages, types, and imports
  2. Parse — lex and parse each source proto file into an ANTLR parse tree
  3. Generate — walk each top-level definition and emit Java source code via generators

Gradle Plugin Integration

Entry point: PbjCompilerPlugin

The plugin (PbjCompilerPlugin implements Plugin<Project>) performs these setup steps during the configuration phase:

  1. Registers a PbjExtension exposing the pbj { } DSL block with two options:
    • javaPackageSuffix — optional suffix appended to derived package names (e.g., ".pbj")
    • generateTestClasses — boolean (default true) controlling test generation
  2. Registers a PbjProtobufExtractTransform artifact transform that extracts .proto files from JAR dependencies on the compile classpath
  3. For each source set, creates a virtual PbjSourceDirectorySet pointing to src/<sourceSet>/proto and registers a PbjCompilerTask

The main source set generates model/codec/schema into build/generated/source/pbj-proto/main/java and tests into build/generated/source/pbj-proto/test/java. The generated source directories are wired as inputs to the Java compile task, so generation happens automatically before compilation.

Task: PbjCompilerTask

The task (extends SourceTask) defines:

  • @InputFiles — proto source files + extracted classpath protos
  • @OutputDirectory — main and test output directories
  • @TaskAction perform() — clears output dirs, then delegates to PbjCompiler.compileFilesIn()

Parsing Pipeline

ANTLR Grammar

The grammar file Protobuf3.g4 (package com.hedera.hashgraph.protoparser.grammar) defines the full proto3 syntax. Notable additions beyond the standard spec:

  • DOC_COMMENT tokens preserve /** ... */ documentation comments through to generated Javadoc
  • OPTION_LINE_COMMENT tokens capture PBJ-specific option comments: // <<<pbj.java_package = "...">>>

Two-Phase Processing

Phase 1 — LookupHelper construction:

Before any code generation, PbjCompiler builds a LookupHelper by parsing every proto file (both source files and classpath dependencies). This pre-scan builds several lookup maps:

MapKeyValue
pbjPackageMapFully qualified proto nameJava package for PBJ model classes
pbjCompleteClassMapFully qualified proto nameComplete Java class name (including outer class for nested types)
protocPackageMapFully qualified proto nameJava package for protoc-generated classes
enumNamesSet of all fully qualified enum names
comparableFieldsByMsgMessage nameList of comparable field names

The LookupHelper resolves Java packages using a priority chain (PBJ comment option → per-definition options → standard java_package + suffix → proto package + suffix). For the full resolution rules, see protobuf-and-schemas.md.

Phase 2 — Per-file generation:

For each source proto file, a ContextualLookupHelper wraps the global LookupHelper with the current file context. The file is lexed and parsed:

FileInputStream → Protobuf3Lexer → CommonTokenStream → Protobuf3Parser → ProtoContext

Then each topLevelDef is dispatched:

  • messageDef → create FileSetWriter (5 JavaFileWriter instances), run all Generator implementations, write files
  • enumDefEnumGenerator.generateEnum() with a single JavaFileWriter
  • serviceDefServiceGenerator.generateService() with a single JavaFileWriter

Field Model (Intermediate Representation)

Rather than building a full AST, the compiler uses lightweight field records extracted directly from ANTLR parse tree contexts. The Field interface defines the contract; three record implementations cover all protobuf field kinds:

SingleField

Represents a regular field or a sub-field within a oneof. Constructed directly from Protobuf3Parser.FieldContext. Stores:

  • type — a FieldType enum value (see below)
  • fieldNumber, name, repeated, deprecated
  • messageType / completeClassName — for message and enum references
  • parent — the OneOfField this belongs to, if any
  • Package references for model, codec, and test imports

Key methods: parseCode() (Java code to parse this field from protobuf input), javaFieldType(), schemaFieldsDef(), parserFieldsSetMethodCase().

OneOfField

Represents a protobuf oneof block. Contains a list of child Field objects (the variants). Generates an inner enum type (e.g., DataOneOfType with values like ACCOUNT_ID, UNSET) for runtime type discrimination.

MapField

Represents a map<K, V> field. Internally decomposed into synthetic keyField and valueField SingleField instances. On the wire, maps are repeated length-delimited entries sorted by key for deterministic encoding.

FieldType Enum

Maps every protobuf type to its Java representation and wire format:

FieldTypeJava typeBoxed typeWire type
INT32, UINT32, SINT32intIntegerVARINT (0)
INT64, UINT64, SINT64longLongVARINT (0)
FLOAT, FIXED32, SFIXED32float/intFloat/IntegerFIXED32 (5)
DOUBLE, FIXED64, SFIXED64double/longDouble/LongFIXED64 (1)
BOOLbooleanBooleanVARINT (0)
STRINGStringStringLENGTH_DELIMITED (2)
BYTESBytesBytesLENGTH_DELIMITED (2)
MESSAGEObjectObjectLENGTH_DELIMITED (2)
ENUMintIntegerVARINT (0)
MAPMapMapLENGTH_DELIMITED (2)
ONE_OFOneOfOneOf

For repeated fields, FieldType.javaType(true) returns the boxed List<> variant (e.g., List<Integer>).

Code Generators

All message generators implement the Generator interface and are registered in Generator.GENERATORS — a map from generator class to the JavaFileWriter accessor on FileSetWriter:

Map.of(
    ModelGenerator.class,     FileSetWriter::modelWriter,
    SchemaGenerator.class,    FileSetWriter::schemaWriter,
    CodecGenerator.class,     FileSetWriter::codecWriter,
    JsonCodecGenerator.class, FileSetWriter::jsonCodecWriter,
    TestGenerator.class,      FileSetWriter::testWriter
);

Each generator is instantiated via reflection and called with the MessageDefContext, a JavaFileWriter, and the ContextualLookupHelper. Generators build Java code as strings and append to the writer.

ModelGenerator

Output: <MessageName>.java in the base package

Generates a Java record for each protobuf message containing:

  • Record fields for each proto field, plus two precomputed fields: $hashCode and $protobufEncodedSize
  • Multiple constructor overloads (with/without unknownFields, with enum types or raw Object storage)
  • Getter methods — foo() returns the value (null if absent), fooOrElse(default) returns a default for absent fields
  • hashCode() / equals() — fields with default values are excluded so adding new default-valued fields doesn't break existing hash maps
  • toString(), compareTo() (when fields are marked pbj.comparable)
  • Builder inner class with fluent API (newBuilder(), toBuilder())
  • OneOf inner enums and typed accessor methods
  • Static PROTOBUF and JSON codec constants

SchemaGenerator

Output: <MessageName>Schema.java in the .schema sub-package

Generates static FieldDefinition constants for each field (field number, type, repeated/optional flags) and a getField(int fieldNumber) method for O(1) lookup.

CodecGenerator (Protobuf)

Output: <MessageName>ProtoCodec.java in the .codec sub-package

Implements the Codec<T> interface for protobuf binary serialization. The generator delegates to specialized sub-generators:

Sub-generatorMethod generatedPurpose
CodecParseMethodGeneratorparse(ReadableSequentialData, ...)Deserialize from protobuf binary
CodecWriteMethodGeneratorwrite(T, WritableSequentialData)Serialize to protobuf binary
CodecWriteByteArrayMethodGeneratorwrite(T) → byte[]Serialize to byte array
CodecMeasureDataMethodGeneratormeasure(T)Compute serialized size
CodecMeasureRecordMethodGeneratormeasureRecord(T)Record-based size measurement
CodecFastEqualsMethodGeneratorfastEquals(T, T)Optimized equality check
CodecDefaultInstanceMethodGeneratorgetDefaultInstance()Singleton default instance
LazyGetProtobufSizeMethodGeneratorgetProtobufSize()Lazy size computation for model

The parse method uses a switch over protobuf tags ((fieldNumber << 3) | wireType) to dispatch to field-specific parsing logic. Maps are sorted by key on write for deterministic encoding.

JsonCodecGenerator

Output: <MessageName>JsonCodec.java in the .codec sub-package

Implements Codec<T> for JSON serialization/deserialization. Structured similarly to CodecGenerator with:

  • JsonCodecParseMethodGenerator — JSON deserialization
  • JsonCodecWriteMethodGenerator — JSON serialization

TestGenerator

Output: <MessageName>Test.java in the .tests sub-package (test source set)

Generates JUnit 5 parameterized tests covering:

  • Round-trip serialization (model → bytes → model) for both protobuf and JSON codecs
  • Equality and hash code verification
  • Unknown fields handling
  • Compatibility with Google protoc-generated classes

EnumGenerator

Output: <EnumName>.java in the base package

Generates a Java enum with:

  • A constant for each proto enum value
  • fromProtobufOrdinal(int) — maps wire value to enum constant
  • toProtobufOrdinal() — maps enum constant to wire value
  • @Deprecated annotations where specified in the proto schema

ServiceGenerator

Output: <ServiceName>ServiceInterface.java in the base package

Generates a Java interface extending ServiceInterface with:

  • SERVICE_NAME and FULL_NAME constants
  • A Method inner enum listing all RPC methods
  • Default method implementations for each RPC (throwing UnsupportedOperationException)
  • An open() routing method that dispatches by method enum to the correct handler
  • Support for all four gRPC call types: unary, client-streaming, server-streaming, and bidirectional
  • An inner Client class implementing the interface via GrpcClient

File Output and Writing

JavaFileWriter

A single .java file accumulator. Generators call addImport() to register imports and append() to build the class body as a string. When writeFile() is called, it assembles the final file:

// SPDX-License-Identifier: Apache-2.0
package <package>;

import <sorted imports>;

<accumulated class body>

FileSetWriter

A record holding five JavaFileWriter instances (model, schema, codec, jsonCodec, test) for a single message. Created by FileSetWriter.create() which resolves output paths and packages for each file type. After all generators run, writeAllFiles() writes them all to disk.

Output Package Structure

For a message with base package com.example.proto:

com/example/proto/
├── MessageName.java                    (model)
├── schema/
│   └── MessageNameSchema.java          (schema)
├── codec/
│   ├── MessageNameProtoCodec.java      (protobuf codec)
│   └── MessageNameJsonCodec.java       (JSON codec)
└── tests/                              (test source set)
    └── MessageNameTest.java            (unit tests)

File naming is controlled by constants in FileAndPackageNamesConfig:

File typeClass suffixSub-package
Model(none)(base)
SchemaSchemaschema
Protobuf CodecProtoCodeccodec
JSON CodecJsonCodeccodec
TestTesttests

Code Style of the Generator

The generators use direct string construction (StringBuilder via JavaFileWriter.append()) rather than templates or AST manipulation. Java source code is built by concatenating string literals, formatted blocks (often using text blocks with .indent()), and field-specific code fragments produced by Field method calls like parseCode(), schemaFieldsDef(), and parserFieldsSetMethodCase().

This approach is simple and keeps all generation logic visible in the generator classes, but means the generators must manually manage indentation, imports, and syntax correctness.

Nested Message Handling

Nested messages (messages defined inside other messages) are detected via Generator.isInner(), which walks up the ANTLR parse tree looking for a parent MessageDefContext. Inner messages are generated as static inner classes within the outer message's model file. The JavaFileWriter abstraction allows inner type generators to append their output to the same writer as the outer type.

Dependency Resolution from JARs

The PbjProtobufExtractTransform Gradle artifact transform extracts .proto files from JAR dependencies. This allows proto files in one module to import proto definitions from another module's published JAR. The extracted protos are passed to PbjCompiler as classpath files — they are parsed for type resolution in the LookupHelper but no code is generated for them (code generation only runs for source files).