vsoup

February 28, 2026 · View on GitHub

A fast, JSoup-inspired HTML5 parser and DOM manipulation library for V, powered by Lexbor v2.6.0.

Features

  • HTML5 parsing — full spec-compliant parsing via Lexbor
  • CSS selectorsselect(), select_first() with cached compiled selectors
  • DOM traversalchildren(), parent(), next_sibling(), etc.
  • DOM manipulationset_attr(), add_class(), append(), remove(), etc.
  • Serializationhtml(), outer_html(), pretty_html()
  • HTTP client — JSoup-style connect(url).get() builder

Quick Start

import vsoup

doc := vsoup.parse('<div class="main"><p>Hello</p><a href="/link">World</a></div>')!
defer { doc.free() }

// CSS selectors
links := doc.@select('a[href]')
println(links.first()!.attr('href'))  // "/link"
println(links.first()!.text())        // "World"

// DOM traversal
body := doc.body()!
for child in body.children() {
    println(child.tag_name())
}

// DOM manipulation
mut div := doc.select_first('.main')!
div.set_attr('data-processed', 'true')
div.add_class('active')
div.append('<span>New content</span>')
println(doc.html())

Installation

Via VPM

v install marcalc.vsoup

From source

git clone https://github.com/marcalc/vsoup.git
cd vsoup
make test

Lexbor v2.6.0 is vendored and compiled directly by V — no CMake or separate build step required.

API Reference

Parsing

doc := vsoup.parse(html_string)!          // parse HTML string
doc := vsoup.parse_file('page.html')!     // parse from file
doc := vsoup.connect('https://example.com').get()!  // fetch & parse
defer { doc.free() }

Document

MethodReturnsDescription
doc.body()?ElementThe <body> element
doc.head()?ElementThe <head> element
doc.title()stringDocument title text
doc.@select(css)ElementsAll matching elements (@ escapes V keyword)
doc.select_first(css)?ElementFirst matching element
doc.html()stringSerialized HTML
doc.pretty_html()stringPretty-printed HTML
doc.free()Free all resources

Element

MethodReturnsDescription
e.tag_name()stringUppercase tag name (e.g. "DIV")
e.local_name()stringLowercase tag name (e.g. "div")
e.id()stringThe id attribute
e.class_name()stringThe class attribute
e.class_names()[]stringIndividual class names
e.has_class(name)boolCheck for a class
e.attr(key)stringAttribute value
e.has_attr(key)boolCheck attribute existence
e.attributes()map[string]stringAll attributes
e.text()stringText content (recursive)
e.html()stringInner HTML
e.outer_html()stringOuter HTML
e.@select(css)ElementsCSS select descendants (@ escapes V keyword)
e.select_first(css)?ElementFirst matching descendant
e.parent()?ElementParent element
e.children()[]ElementChild elements
e.first_child()?ElementFirst child element
e.next_sibling()?ElementNext sibling element
e.prev_sibling()?ElementPrevious sibling element
e.set_attr(k, v)Set attribute
e.remove_attr(k)Remove attribute
e.add_class(name)Add a class
e.remove_class(name)Remove a class
e.append(html)Append child HTML
e.prepend(html)Prepend child HTML
e.remove()Remove from DOM
e.empty()Remove all children
e.set_text(text)Set text content

Elements

MethodReturnsDescription
es.len()intNumber of elements
es.first()?ElementFirst element
es.last()?ElementLast element
es.at(i)?ElementElement at index
es.text()stringCombined text of all
es.attr(key)stringFirst matching attr
es.each_attr(key)[]stringAttr from each element
es.@select(css)ElementsSub-select across all
es.iter()[]ElementFor use in for loops

HTTP Client

doc := vsoup.connect('https://example.com')
    .user_agent('vsoup/0.1')
    .header('Accept', 'text/html')
    .cookie('session', 'abc123')
    .get()!
defer { doc.free() }

Benchmarks

Selector benchmarks against native Lexbor C and jsoup (Java), using the same HTML fixture and methodology: 5 iterations x 10,000 repetitions, mean time in seconds.

Lexbor v2.6.0 | jsoup 1.22.2 | macOS ARM64

SelectorLexbor Cvsoup (V)jsoup (Java)
div0.004180.00622 (1.5x)0.01596 (3.8x)
div span0.005540.00715 (1.3x)0.02966 (5.4x)
p ~ p0.005030.00652 (1.3x)0.02262 (4.5x)
p + p0.004960.00660 (1.3x)0.01900 (3.8x)
div > p0.005070.00692 (1.4x)0.01434 (2.8x)
div > div0.005120.00731 (1.4x)0.01414 (2.8x)
div p:not(#p-5) a0.007850.00953 (1.2x)0.03763 (4.8x)
div:has(a) a0.007260.00905 (1.2x)0.02558 (3.5x)
div p:nth-child(n+2)0.006430.00799 (1.2x)0.02950 (4.6x)
div p:nth-child(n+2 of div > p)0.013640.01685 (1.2x)n/a

vsoup is 1.2-1.5x native Lexbor C (thin wrapper overhead) and 2-4x faster than jsoup.

The remaining overhead vs raw C is from the V function call layer and result collection into V arrays. The actual lxb_selectors_find is called identically — compiled selectors are cached and reused across queries.

Running benchmarks

make bench-selectors  # vsoup vs lexbor (raw C bindings + public API)
make bench-parse      # vsoup microbenchmarks (parse, traverse, select, serialize, manipulate)
make bench-jsoup      # jsoup comparison (downloads jar automatically)

Thread Safety

vsoup is not thread-safe. Each Document (and its associated Element values) should be used from a single thread. If you need to parse multiple documents concurrently, create a separate Document per thread.

Memory Management

Document owns the underlying Lexbor C memory and must be freed with free():

doc := vsoup.parse(html)!
defer { doc.free() }  // always pair with defer

Element is a lightweight, non-owning view (24 bytes) into the document's DOM tree. Elements do not need to be freed individually, but they must not be used after their parent Document is freed — doing so is undefined behavior.

Error Handling

Parsing and HTTP operations return V Result types (!). Use or {} blocks to handle errors:

// Parsing errors
doc := vsoup.parse(html) or {
    eprintln('Parse failed: ${err}')
    return
}
defer { doc.free() }

// Selector queries return Option types
elem := doc.select_first('.missing') or {
    println('Element not found')
    return
}

// HTTP errors
doc2 := vsoup.connect('https://example.com').get() or {
    eprintln('Fetch failed: ${err}')
    return
}
defer { doc2.free() }

Architecture

vsoup
├── bindings.v      # C FFI declarations (lexbor)
├── lexbor_*.c      # Per-module unity builds — V compiles lexbor directly
├── helpers.v       # C↔V conversion, serialization, selector cache
├── vsoup.v         # parse(), parse_file(), connect()
├── document.v      # Document struct
├── element.v       # Element struct (non-owning DOM node view)
├── elements.v      # Elements collection
├── node_type.v     # NodeType enum
├── connection.v    # HTTP client
├── c_shims.c/h     # Compatibility shims for lexbor v2.6.0
└── lexbor/         # Vendored lexbor v2.6.0 source

Key design decisions:

  • Element is a lightweight, non-owning pointer wrapper (24 bytes) — freely copyable
  • Document owns the C memory and must be freed with free()
  • CSS selectors are compiled once and cached per-document for reuse
  • All V strings are copies from C memory (no dangling pointers)

Acknowledgements

  • Lexbor — the fast, spec-compliant HTML5 engine that powers vsoup's parsing and selector machinery. Created by Alexander Borisov.
  • jsoup — the excellent Java HTML parser whose clean API design inspired vsoup's interface. Created by Jonathan Hedley.

License

MIT