Kaltblut

June 10, 2026 · View on GitHub

Sonatype Central javadoc

If this project saved you some time or made your day a little easier, a star would mean a lot — it helps others find it too.

A Java toolkit for working with ZUGFeRD / Factur-X hybrid invoices: detect the flavor of any hybrid PDF, extract the embedded XML and supporting attachments, and validate the carrier-side specification rules — including PDF/A-3 conformance via veraPDF.

XML-side business rules (cardinalities, EN 16931 rules, code lists ...) are out of scope for this project; use phive-rules-zugferd for those.

Per-version requirements analysis used to design this library lives under docs/requirements/. See docs/requirements/comparison.md for a cross-version overview of every PDF carrier rule from ZUGFeRD 1.0 (2014) through Factur-X 1.09 / ZUGFeRD 2.5 (2026-06-10).

Supported Versions

The detection table covers every published release since 2014:

ZUGFeRDFactur-XXMP namespace URIEmbedded XML name
1.0n/aurn:ferd:pdfa:CrossIndustryDocument:invoice:1p0#ZUGFeRD-invoice.xml
2.0.1n/aurn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0#zugferd-invoice.xml
2.11.0.05urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml
2.11.0.05urn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0# (legacy)zugferd-invoice.xml
2.21.0.06urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml
2.21.0.06urn:zugferd:pdfa:CrossIndustryDocument:invoice:1p0# (legacy)zugferd-invoice.xml
2.31.0.07urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml
2.3.21.07.2urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml
2.3.31.07.3urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml
2.41.08urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml
2.51.09urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#factur-x.xml / xrechnung.xml

Why "Kaltblut"?

Kaltblut is German for the family of heavy draft horse breeds — the strongest and steadiest of the Zugpferde. The name is a nod to ZUGFeRD, which German speakers hear as Zugpferd ("draft horse"). Kaltblut picks the most workmanlike of them: built to pull heavy loads calmly and reliably, much like this toolkit aims to handle ZUGFeRD invoices.

As a bonus, kaltblütig in everyday German also means "cool-headed" — a useful trait for any library dealing with tax-relevant invoice processing

What Kaltblut Does (and Does Not Do)

TierConcernStatus
1Detection & metadata: flavor, profile, XMP fields, embedded-file name, /AFRelationshipimplemented
2Extraction: invoice XML, named attachments, full attachment listimplemented
3Validation: BR-HYBRID-* business rules, PDF/A-3 (via veraPDF SPI)implemented
4Creation / embedding XML to produce hybrid PDFsnot in scope
XML business-rules validation (EN 16931, KoSIT XRechnung)use phive-rules-zugferd

Project Layout

This is a multi-module Maven project:

  • kaltblut-core — the library. Source abstraction, model, inspector, extractor, validator. Classes live under com.helger.kaltblut.core.*.
  • kaltblut-testfiles — shared test fixtures (sample PDFs + classpath-resource locator).
  • kaltblut-verapdf — PDF/A-3 validation adapter that wires veraPDF to the IPdfA3ValidatorSPI SPI. Optional; pull it in only if you need PDF/A-3 conformance checks.
  • kaltblut-cli — the command-line client (picocli). Builds a standalone fat JAR.

Key Library Concepts

IHybridSource — the input abstraction

All public entry points take a source. Use one of the HybridSource factories:

import com.helger.kaltblut.core.source.HybridSource;
import com.helger.kaltblut.core.source.IHybridSource;

IHybridSource s1 = HybridSource.fromFile (new File ("invoice.pdf"));         // lazy + cached
IHybridSource s2 = HybridSource.fromPath (Path.of ("invoice.pdf"));          // lazy + cached
IHybridSource s3 = HybridSource.fromBytes (aPdfBytes);                       // wraps array
IHybridSource s4 = HybridSource.fromByteBuffer (aBuffer);                    // copies
IHybridSource s5 = HybridSource.fromUrl (new URL ("https://example.com/invoice.pdf"));      // http/https only, with timeouts
IHybridSource s6 = HybridSource.fromInputStream (aIS);                       // reads now, closes
IHybridSource s7 = HybridSource.fromClasspath ("samples/invoice.pdf");       // resource path

IHybridSource is byte-array-centric: the contract is byte[] getBytes() throws IOException, plus long getSize() and String getName() as diagnostic hints. PDFBox 3 needs random access and every Kaltblut operation eventually needs the complete PDF in memory, so distinguishing single-read from multi-read inputs added API surface without value. Implementations may read lazily on first call and cache the result; callers must not mutate the returned array.

HybridSource.fromUrl only accepts http and https URLs and applies sensible connect / read timeouts (defaults 10 s / 60 s); other schemes (file:, jar:, ftp:, ...) are refused to prevent accidental SSRF / local-file-read when forwarding caller-supplied URLs.

HybridLimits — byte / count ceilings

Every Tier-1/2/3 entry point accepts an optional HybridLimits (defaulting to HybridLimits.DEFAULTS) that caps:

  • the input PDF size (default 64 MiB),
  • per-attachment inflated size (default 32 MiB),
  • aggregate attachment size (default 128 MiB),
  • attachment count (default 100).

Use HybridLimits.UNLIMITED to disable, or build a custom instance with the immutable withMaxPdfBytes(...) / withMaxAttachmentBytes(...) / ... witherers. Reading past a limit throws IOException rather than letting the JVM OOM.

Model

The model classes in com.helger.kaltblut.core.model are immutable value objects:

  • EZugferdFlavor — namespace-URI fingerprint of the spec generation.
  • EZugferdProfileMINIMUM, BASIC_WL, BASIC, COMFORT, EN_16931, EXTENDED, XRECHNUNG.
  • EAFRelationshipData, Source, Alternative, Supplement, Unspecified.
  • EZugferdCountryDE, FR, OTHER (drives country-specific BR-HYBRID rules).
  • HybridMetadata — single snapshot of XMP fields + /AF data.
  • HybridAttachment — name, MIME type, AFRelationship, ModDate, bytes, invoice-XML flag.

Usage

Tier 1: detection

import com.helger.kaltblut.core.inspect.HybridInspector;
import com.helger.kaltblut.core.model.EZugferdFlavor;
import com.helger.kaltblut.core.model.HybridMetadata;

IHybridSource aSource = HybridSource.fromFile (new File ("invoice.pdf"));

if (HybridInspector.isHybridInvoice (aSource))
{
  EZugferdFlavor eFlavor = HybridInspector.detectFlavor (aSource);
  HybridMetadata aMeta = HybridInspector.readMetadata (aSource);
  System.out.println ("Flavor:        " + aMeta.getFlavor ());
  System.out.println ("Profile:       " + aMeta.getProfile ());
  System.out.println ("Embedded file: " + aMeta.getEmbeddedFileName ());
  System.out.println ("AFRelationship: " + aMeta.getAFRelationship ());
}

Tier 2: extraction

import com.helger.kaltblut.core.extract.HybridExtractor;
import com.helger.kaltblut.core.model.HybridAttachment;

byte [] aXmlBytes = HybridExtractor.extractInvoiceXml (aSource);
List <HybridAttachment> aAttachments = HybridExtractor.listAttachments (aSource);
byte [] aExcel = HybridExtractor.extractAttachment (aSource, "list_of_measurement.xlsx");

Security note: the bytes returned by extractInvoiceXml / extractAttachment come from a potentially untrusted PDF. If you parse the XML yourself, configure your XML processor to disable external entities, DTDs, and XInclude (i.e. FEATURE_SECURE_PROCESSING=true plus disallow-doctype-decl=true), or use a library — such as phive-rules-zugferd — that does so by default. Otherwise a malicious invoice can XXE-read local files or trigger SSRF.

Tier 3: validation

import com.helger.kaltblut.core.model.EZugferdCountry;
import com.helger.kaltblut.core.validate.HybridFinding;
import com.helger.kaltblut.core.validate.HybridValidator;
import com.helger.kaltblut.core.validate.HybridValidationLayer;
import com.helger.kaltblut.core.validate.HybridValidationResult;

HybridValidator aValidator = new HybridValidator ();
aValidator.getSettings ()
          .setCountry (EZugferdCountry.DE)
          .setCheckPdfA3 (true)
          .setApplyDePdfADowngrade (true);

HybridValidationResult aResult = aValidator.validate (aSource);

// Per-layer reporting: one BR_HYBRID layer + (when enabled) one PDF_A3 layer, each
// carrying its own findings and wall-clock duration.
for (HybridValidationLayer aLayer : aResult.getAllLayers ())
{
  System.out.println (aLayer.getDisplayName () + " - " + aLayer.getDuration ().toMillis () + "ms");
  for (HybridFinding aF : aLayer.getAllFindings ())
    System.out.println ("  " + aF);
}

// Aggregate predicates still work across all layers.
if (!aResult.isValid ())
  System.err.println ("Document considered invalid");

PDF/A-3 validation runs via the IPdfA3ValidatorSPI SPI. Add kaltblut-verapdf to the classpath to enable veraPDF; without it validate() records a single INFORMATION finding noting that PDF/A-3 conformance was not checked.

Command line

Build the standalone fat JAR and run it:

mvn clean package
java -jar kaltblut-cli/target/kaltblut-cli-full.jar [subcommand] [options] <files...>

Subcommands:

SubcommandDescription
inspectPrint flavor, profile, XMP fields, embedded-file name, and /AFRelationship.
extractWrite the embedded invoice XML to disk.
attachmentsList all embedded files (invoice XML + supporting documents).
validateRun BR-HYBRID-* business rules and PDF/A-3 validation. Exit code 0 if no ERROR findings.

Common options:

OptionSubcommandDescriptionDefault
-o, --output-dirextractDirectory to write XML files to.
-s, --suffixextractOutput filename suffix-invoice
-c, --countryvalidateDE, FR, or OTHER — drives country-specific rulesOTHER
--no-pdfavalidateSkip PDF/A-3 validation via the SPIoff
--no-de-pdfa-downgradevalidateDisable the BR-FX-DE-03 downgrade for DE↔DE invoicesoff
-h, --helpallShow help
-V, --versionallShow version

Examples:

# Detect the flavor of one or more PDFs
java -jar kaltblut-cli-full.jar inspect invoice.pdf another-invoice.pdf

# Extract the invoice XML to /tmp/out/
java -jar kaltblut-cli-full.jar extract -o /tmp/out invoice.pdf

# List all embedded files in a PDF
java -jar kaltblut-cli-full.jar attachments invoice.pdf

# Validate a DE↔DE invoice (PDF/A-3 errors downgraded per BR-FX-DE-03)
java -jar kaltblut-cli-full.jar validate -c DE invoice.pdf

# Validate without PDF/A-3 (fast path; only the BR-HYBRID-* rules run)
java -jar kaltblut-cli-full.jar validate --no-pdfa invoice.pdf

Building

Requires Java 17+ and Maven.

mvn clean package

The build produces (replacing x.y.z with the effective version):

  • kaltblut-core/target/kaltblut-core-x.y.z-SNAPSHOT.jar — core library JAR.
  • kaltblut-verapdf/target/kaltblut-verapdf-x.y.z-SNAPSHOT.jar — veraPDF adapter JAR.
  • kaltblut-cli/target/kaltblut-cli-x.y.z-SNAPSHOT.jar — CLI library JAR.
  • kaltblut-cli/target/kaltblut-cli-full.jar — standalone executable fat JAR (all dependencies bundled).

Maven Coordinates

<!-- Core library: detection + extraction + BR-HYBRID validation -->
<dependency>
  <groupId>com.helger.kaltblut</groupId>
  <artifactId>kaltblut-core</artifactId>
  <version>x.y.z</version>
</dependency>

<!-- Optional: veraPDF-backed PDF/A-3 validation -->
<dependency>
  <groupId>com.helger.kaltblut</groupId>
  <artifactId>kaltblut-verapdf</artifactId>
  <version>x.y.z</version>
</dependency>

Extending

To plug in a different PDF/A-3 validator (or none at all), implement com.helger.kaltblut.core.validate.IPdfA3ValidatorSPI and register the class via META-INF/services/com.helger.kaltblut.core.validate.IPdfA3ValidatorSPI. The validator is discovered via ServiceLoader; only the first implementation found is used.

License

Apache License, Version 2.0.

News and Noteworthy

v0.9.2 - 2026-06-10

  • Added support for ZUGFeRD v2.5

v0.9.1 - 2026-05-13

  • Validation: the result of HybridValidator.validate is now structured as a list of HybridValidationLayers (BR_HYBRID + optional PDF_A3, identified by EHybridValidationLayerKind) instead of a flat finding list. Each layer carries its own findings and wall-clock Duration. Aggregate predicates on HybridValidationResult continue to work across all layers.
  • Breaking: ValidationResult renamed to HybridValidationResult, and the new per-layer container is HybridValidationLayer.
  • Breaking: EHybridSeverity.FATAL renamed to ERROR. Predicate methods follow: isFatal()isError(), hasFatal()hasError(), hasFatalRule()hasErrorRule().
  • EHybridSeverity entries now carry the equivalent ph-commons EErrorLevel via getErrorLevel(), so consumers mapping findings into ph-commons error infrastructure no longer need a translation table.
  • EZugferdCountry now implements IHasID<String> with getID() and the static getFromIDOrNull(String) factory, matching the style of the other ph-commons-based enums.
  • CLI validate subcommand prints one line per layer with its kind, finding count, and duration, then the layer's findings indented underneath.

v0.9.0 - 2026-05-13

  • Detection: recognises all five XMP extension-schema namespaces seen across ZUGFeRD 1.0, 2.0.1, 2.1, 2.2, 2.3, 2.3.2, 2.3.3 and 2.4.
  • Extraction: invoice XML, named attachments, full attachment list including Modification Date and MIME type.
  • Validation: BR-HYBRID-01 through BR-HYBRID-15 (and the BR-HYBRID-DE-/-FR- country variants) plus PDF/A-3 conformance via the IPdfA3ValidatorSPI SPI implemented by kaltblut-verapdf using veraPDF (-jakarta artifact line, JAXB 4.x only).
  • Command-line client with subcommands inspect, extract, attachments, validate.