twitter-text in Rust

January 16, 2026 ยท View on GitHub

A Rust implementation of twitter-text that parses tweet text using a Pest PEG grammar. Includes bindings for Ruby, Python, Java, C++, Swift, and WebAssembly.

Features

  • Entity extraction: URLs, @mentions, #hashtags, $cashtags, and emoji
  • Tweet validation: 280 weighted character limit with configurable weights
  • Autolinking: Convert entities to HTML links
  • Hit highlighting: Highlight search terms in tweet text
  • Unicode 17.0: Full emoji support including ZWJ sequences and skin tone modifiers

Quick Start

Using Cargo

cargo build
cargo test

Using Bazel

# Build everything
bazel build //rust/...

# Run all tests
bazel test //rust/...

Language Bindings

LanguageDirectoryRequirementsTechnology
Rubyrust/ruby-bindings/Ruby 3.3+Magnus FFI
Pythonrust/python-bindings/Python 3.12PyO3
Javarust/java-bindings/JDK 23+Foreign Function & Memory API
C++rust/cpp-bindings/C++17cxx.rs
Swiftrust/swift-bindings/Swift 6.0+C FFI
WebAssemblyrust/wasm-bindings/-wasm-bindgen

Building Bindings

# Ruby
bazel build //rust/ruby-bindings:twittertext

# Python
bazel build //rust/python-bindings:twitter_text

# Java
bazel build //rust/java-bindings:twitter_text_java_ffm

# C++
bazel build //rust/cpp-bindings/...

# Swift
bazel build //rust/swift-bindings:TwitterText

# WebAssembly
bazel build //rust/wasm-bindings:twitter_text_wasm

Architecture

Core Components

  • PEG Grammar Parser (rust/parser/): Pest grammar for parsing tweet entities
  • Nom Parser (rust/twitter-text/src/nom_parser/): High performance parser using Nom, tested against the PEG grammar for correctness
  • Main Library (rust/twitter-text/): Extraction, validation, autolinking, and highlighting
  • Configuration (rust/config/): Character weights and URL length settings
  • Conformance Tests (rust/conformance/): Tests against canonical twitter-text test suites

Entity Parsing Order

The grammar processes entities in this order to resolve ambiguities:

  1. URLs (including t.co short URLs)
  2. Hashtags
  3. Mentions
  4. Cashtags

Dependencies

Most of these are downloaded by Bazel. These dependencies are listed to explain which versions they are implemented in. A notable exception are the Ruby dependencies, because they are not fully managed by Bazel.

  • Rust: 1.91.1+
  • Bazel: 8.4.2+ (for full build)
  • Ruby: 3.3+ (requires libyaml: brew install libyaml on macOS)
  • Python: 3.12
  • Java: JDK 23+
  • LLVM: 17.0.6 (hermetic toolchain via Bazel)

Conformance

This implementation passes the canonical twitter-text conformance tests in conformance/*.yml. These tests cover:

  • Autolink (URL/mention/hashtag linking)
  • Extract (entity extraction)
  • Validation (tweet validity)
  • Hit highlighting

License

Apache 2.0