helium

July 1, 2026 · View on GitHub

CI Go Reference Ask DeepWiki

Helium is a fast XML toolkit for Go covering XML parsing, SAX2-style streaming, XPath 3.1, XSLT 3.0, XInclude, XSD, Relax NG, and Schematron.

The root helium package handles parsing, DOM building, and serialization, but the module is broader than an XML parser. It also includes xpath3 for XPath 3.1 querying and xslt3 for XSLT 3.0 transformations, alongside xpath1 for XPath 1.0 compatibility, xsd, relaxng, and schematron for validation, xinclude for inclusion processing, c14n for canonicalization, html for HTML parsing, and shim for encoding/xml-compatible APIs.

It started as an effort to port libxml2-style capabilities to Go, but grew broader native Go APIs along the way. The goal is to provide a full Go XML stack for parsing, querying, transforming, and validating documents, with each major feature area documented in its own package README.

SYNOPSIS

package examples_test

import (
  "context"
  "fmt"

  "github.com/lestrrat-go/helium"
)

func Example_helium_parse() {
  // helium.NewParser().Parse is the simplest way to parse an XML document from a byte slice.
  // It returns a *helium.Document representing the parsed DOM tree.
  doc, err := helium.NewParser().Parse(context.Background(), []byte(`<root><child>hello</child></root>`))
  if err != nil {
    fmt.Printf("failed to parse: %s\n", err)
    return
  }

  // WriteString serializes the entire document back to an XML string,
  // including the XML declaration (<?xml version="1.0"?>).
  s, err := helium.WriteString(doc)
  if err != nil {
    fmt.Printf("failed to serialize: %s\n", err)
    return
  }
  fmt.Println(s)
  // Output:
  // <?xml version="1.0"?>
  // <root><child>hello</child></root>
}

source: examples/helium_parse_example_test.go

Packages

Each public subpackage has its own README.md with package-specific details and an embedded example.

PackageDescriptionNotes
c14nW3C Canonical XML support.C14N 1.0, exclusive C14N 1.0, and C14N 1.1.
catalogOASIS XML Catalog loading and resolution.Useful with parsers, validators, and external resources.
enumShared typed enums for DTD declarations.Low-level support package; no standalone example.
htmlHTML parser and serializer on top of helium nodes.Produces helium DOM nodes or SAX-style events.
relaxngRELAX NG compilation and validation.Schema compile step plus document validation.
saxSAX2 handler interfaces and helpers.Event-driven parsing surface used by helium and html.
schematronSchematron compilation and validation.Rule-based XML validation with XPath assertions.
shimencoding/xml-compatible API backed by helium.Import-path swap for existing stdlib-style code.
sinkGeneric async event sink.Also satisfies helium.ErrorHandler when T is error.
streamStreaming XML writer.Writes XML directly without building a DOM.
xincludeXInclude processing for helium documents.Supports recursive inclusion and custom resolvers.
xmldsig1W3C XML Digital Signatures 1.1 over helium documents.Experimental; API may change.
xmlenc1W3C XML Encryption 1.1 over helium documents.Experimental; API may change.
xpath1XPath 1.0 compilation and evaluation.Includes convenience helpers like Find and Evaluate.
xpath3XPath 3.1 compilation and evaluation.Includes a compiler, evaluator, maps, arrays, and HOFs.
xpointerXPointer evaluation.Supports shorthand, element(), and XPath-backed schemes.
xsdXML Schema compilation and validation.XSD 1.0 (default) and opt-in XSD 1.1 compiler plus validator APIs.
xslt3XSLT 3.0 stylesheet compilation and execution.Targets Basic XSLT 3.0 conformance.

helium CLI

The command-line interface is exposed as helium. Currently implemented subcommands: lint, xpath, xslt, xsd validate, relaxng validate, schematron validate. Use helium lint in place of the old heliumlint command.

CommandPurpose
helium lintParse and lint XML documents
helium xpathEvaluate XPath expressions against XML input
helium xsltTransform XML with XSLT 3.0 stylesheets
helium relaxng validateValidate XML documents against a RELAX NG schema
helium schematron validateValidate XML documents against a Schematron schema
helium xsd validateValidate XML documents against an XML Schema

See cmd/helium/README.md for command-specific documentation.

Security

NewParser() is secure by default — it is safe to point at untrusted XML with no extra configuration. By default:

  • External entity and DTD loading is blocked (BlockXXE(true)), so XML External Entity (XXE) attacks are rejected.
  • No filesystem is exposed: the parser's FS is a deny-all filesystem, so even a document that reaches a loader cannot open host paths.
  • Network access is forbidden (AllowNetwork(false)). The core parser has no network loader, so this is belt-and-suspenders.
  • Element nesting depth is capped at 256 (MaxDepth(256); 0 = unbounded).
  • Entity substitution and external DTD loading are off (SubstituteEntities(false), LoadExternalDTD(false)); the entity-expansion amplification, name-length, and content-model-depth guards are at their defaults (MaxEntityAmplification, MaxNameLength, MaxContentModelDepth); and any external DTD subset — once explicitly enabled — is capped at 10 MiB.

The builder is clone-on-write, so one configured parser is safe to reuse across goroutines.

To deliberately load external resources from a trusted source, opt back in explicitly:

doc, err := helium.NewParser().
    BlockXXE(false).            // allow external entities and DTDs
    LoadExternalDTD(true).      // read the external DTD subset
    SubstituteEntities(true).   // expand entities
    FS(helium.PermissiveFS()).  // open any os.Open path (or pass a confined fs.FS)
    Parse(ctx, xmlBytes)

helium.PermissiveFS() returns an fs.FS that opens any path via os.Open, restoring the historical unsandboxed behavior; prefer a confined fs.FS rooted at a trusted directory when the document's external references are known. Passing FS(nil) restores the deny-all default.

The parser cannot know your resource budget, so even with the safe defaults the caller should also:

  • Enforce a maximum raw document size before calling Parse.
  • Pass a context.Context with a deadline to Parse / ParseReader.
  • Leave the entity-amplification, name-length, and content-model-depth limits at their defaults — passing a negative value to MaxEntityAmplification, MaxNameLength, or MaxContentModelDepth removes that guard.
  • Be cautious enabling XInclude, catalogs, DTD validation, or default-DTD-attribute processing for untrusted input; when you do, keep every external resource allowlisted and size-bounded. The xinclude processor is also secure by default — with no resolver configured it denies all filesystem access; grant access with Resolver(xinclude.NewFSResolver(fsys)) backed by a confined fs.FS (os.Root.FS), or restore historical OS-path access with xinclude.NewFSResolver(helium.PermissiveFS()). xinclude.Processor.MaxIncludeDepth bounds the nesting depth of included documents, and MaxIncludeSize caps the bytes read per included resource.

The xsd schema compiler is likewise secure by default: xsd.NewCompiler() denies all nested-schema filesystem access, so an untrusted schema cannot disclose local files or exhaust resources through a hostile xs:include/xs:import/xs:redefine schemaLocation. Each nested schema is read through a fixed byte cap regardless of the fs.FS in use. Opt into host access with Compiler.FS(helium.PermissiveFS()) or a confined fs.FS; Compiler.FS(nil) restores the deny-all default.

Caveat: a permissive or directory-rooted FS is not yet a complete sandbox. External-resource paths are joined against the document base URI and may be absolute or use OS-specific separators, so os.DirFS-style roots (which enforce fs.ValidPath) reject them. Until path normalization lands, rely on the deny-all default for confinement rather than a chroot-style fs.FS.

The xmldsig1 (signatures) and xmlenc1 (encryption) packages are experimental and should not be relied on inside a security or compliance boundary yet.

encoding/xml compatibility

The shim package is an import-path-compatible replacement for encoding/xml backed by helium's parser (Marshal, Unmarshal, Encoder, Decoder, and the usual struct tags). It is a migration aid, not a byte-for-byte behavioral clone. Known differences:

  • Decoder.Strict = false is not supported; Decoder.AutoClose is a no-op and HTMLAutoClose is omitted.
  • Undeclared namespace prefixes are rejected rather than passed through.
  • Namespace declarations are emitted before regular attributes.
  • Decoder.InputOffset is approximate rather than exact.
  • Empty elements captured via ,innerxml may re-serialize as self-closed tags.

Migrate behind your own tests rather than assuming a transparent swap.

Performance

Helium parses XML into a full DOM tree. The benchmark below compares that DOM build against two lower-level baselines: an encoding/xml token loop (Decoder.Token) and libxml2 via cgo.

That is a narrower benchmark than every real encoding/xml workload. Many Go programs use encoding/xml to decode directly into structs, and this section is not meant to dismiss that use case or the package. The point here is simply that Helium's DOM parse is already quite fast: it is materially faster than the stdlib token benchmark on all three corpora, it now edges past libxml2 on the medium corpus, and it is clearly ahead on the largest corpus.

Benchmarks parse real-world XML files of varying sizes (AMD Ryzen 9 7900X3D, Go 1.26.1, go test -run '^$' -bench 'Benchmark(HeliumParse|StdlibXMLDecode|Libxml2Parse)$' -benchmem -count=5 -tags libxml2bench ./bench, median shown):

FileHeliumencoding/xmllibxml2 (cgo)
109 KB139 MB/s77 MB/s158 MB/s
196 KB124 MB/s66 MB/s109 MB/s
3 MB497 MB/s120 MB/s366 MB/s

Helium also allocates far fewer objects than encoding/xml in this benchmark. On the 3 MB corpus, the current Helium DOM parse lands around 94 allocs/op versus about 155k allocs/op for encoding/xml.

To run the benchmarks yourself:

go test -bench='BenchmarkHeliumParse|BenchmarkStdlibXMLDecode' -benchmem ./bench/
# Include libxml2 (requires cgo and libxml2-dev):
go test -tags cgo,libxml2bench -bench=. -benchmem ./bench/

Current status

  • Core functionality is implemented: XML/HTML parsing, DOM building, SAX2, XPath 1.0, XPath 3.1, Basic XSLT 3.0, XInclude, C14N, RELAX NG, Schematron, XSD, XML Catalog, streaming XML writer, and encoding/xml compatibility (shim package).
  • Experimental: W3C XML Digital Signatures 1.1 (xmldsig1) and XML Encryption 1.1 (xmlenc1). These APIs may change and may move to a separate repository.
  • W3C conformance suites: ~22,250 / 22,744 QT3 tests pass for XPath 3.1; ~11,780 / 13,129 W3C tests pass for XSLT 3.0 (skips are XSLT 1.0/2.0 backwards compatibility and other out-of-scope features); XSD 1.1 passes the XSD-1.1-tagged test groups of the W3C XML Schema Test Suite — 967 groups from the IBM, Saxon, Oracle, and W3C-WG collections, 0 failures.
  • libxml2-compat golden tests: core XML parsing 100%, XSD 99.6%, RELAX NG 100%, Schematron 100%, C14N 100%, HTML 100%.
  • XSLT support is intentionally scoped to Basic XSLT 3.0. Backwards compatibility modes for XSLT 1.0/2.0 are not part of the target feature set.
  • A helium CLI provides lint, xpath, xslt, xsd validate, relaxng validate, and schematron validate subcommands.
  • Some edge cases and parity gaps are still being iterated on; contributions and issue reports are welcome.

For coding agents

If you are an AI coding agent (Claude Code, Codex, Gemini, etc.) working in this repository, start with AGENTS.md (also available as CLAUDE.md). It points to the cached navigation and architecture docs under .claude/docs/ and lists the pre-read rules, scope boundaries, and generated-file policy you must follow before making changes.

Runnable usage examples live in the examples/ directory as *_example_test.go files — read those first to see how the public APIs are meant to be used.

Contributing

Issues

For bug reports and feature requests, please follow the issue template when possible. If you can include a minimal reproduction or failing test case, that helps a lot.

Pull Requests

Please include tests that cover your changes.

If your change touches generated files, update the generator/source first, regenerate, and commit both the source and generated outputs together.

Please keep pull requests focused and small enough to review quickly.

Discussions / Usage

For usage questions, design discussion, or "is this approach reasonable?" questions, please open a GitHub Discussion first.