DOCX export: the semantic backend

June 10, 2026 · View on GitHub

PDF is GraphCompose's fixed-layout output — every fragment lands at exact coordinates. DOCX is different on purpose: it is a semantic export that walks the document graph and writes editable Word content, skipping the layout pass entirely (no per-page pagination, no PDF chrome). Use it when the recipient needs to edit the document; use PDF when pixels must match.

Exporting a session

import com.demcha.compose.document.backend.semantic.DocxSemanticBackend;

try (DocumentSession document = GraphCompose.document()
        .pageSize(595, 842)
        .margin(DocumentInsets.of(36))
        .create()) {
    document.pageFlow().name("Flow")
            .addParagraph(p -> p.text("Hello Word"))
            .addTable(t -> t
                    .columns(DocumentTableColumn.auto(), DocumentTableColumn.auto())
                    .row("R1C1", "R1C2"))
            .build();

    byte[] docx = document.export(new DocxSemanticBackend());
    // or write straight to disk:
    document.export(new DocxSemanticBackend(), Path.of("out/report.docx"));
}

export(backend) returns the DOCX bytes and also writes the session's default output file when one was given to GraphCompose.document(path); the two-argument overload targets an explicit path.

Dependency note: the backend requires org.apache.poi:poi-ooxml on the classpath. GraphCompose declares it optional in its POM, so consumers who export DOCX must add it explicitly — consumers who only render PDF carry no POI footprint.

What maps 1:1

Document nodeDOCX output
ParagraphsWord paragraphs with alignment, font, size, colour, bold/italic/underline; inline runs preserved
ListsMarker-prefixed paragraphs in the list's text style; nested items indent per depth and keep their own markers
TablesWord tables, one cell per cell
ImagesEmbedded pictures at the node's declared size
RowsA one-row table, so editors keep the side-by-side layout (cell content limited to atomic children)
Sections / containersChildren written in order
SpacersEmpty paragraphs carrying the vertical gap as spacing-after
Page breaksExplicit Word page breaks

Page geometry (size and margins) and session metadata (title, author, subject, keywords) carry into the Word document as well.

What falls back

  • Charts → data table. The semantic export has no layout pass, so a chart's compiled vector geometry does not exist here. Its semantic content is its data, so the backend writes a categories-by-series table (values formatted with the chart's own axis format) and logs one capability warning per export. See charts.md.
  • Shape containers → inline layers. DOCX has no portable equivalent of a graphics-state path clip, so the container's layers are written inline, in source order, without the outline frame and without clipping — again with one warning per export.

What is skipped

Lines, ellipses, standalone shapes, and barcodes are silently skipped — they are pure fixed-layout geometry with no semantic equivalent. Headers/footers, watermarks, and protection options are also ignored by the current exporter.

The rule of thumb: if the document leans on geometry — shapes, layered designs, precise placement — export PDF for the reader and DOCX only as an editable companion.

Round-trip coverage (paragraphs, tables, metadata, chart fallback) lives in DocxSemanticBackendTest.