OAR-OCR

May 15, 2026 · View on GitHub

Crates.io Version Crates.io Downloads (recent) dependency status GitHub License

An Optical Character Recognition (OCR) and Document Layout Analysis library written in Rust.

Quick Start

Installation

cargo add oar-ocr

With GPU support:

cargo add oar-ocr --features cuda

With auto-download of model files from ModelScope:

cargo add oar-ocr --features auto-download

Bare file names passed to the builders are then fetched from greatv/oar-ocr on ModelScope into $OAR_HOME (default ~/.oar) and verified against their expected SHA-256. See docs/models.md for the exact path resolution rules.

Basic Usage

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize the OCR pipeline
    let ocr = OAROCRBuilder::new(
        "pp-ocrv5_mobile_det.onnx",
        "pp-ocrv5_mobile_rec.onnx",
        "ppocrv5_dict.txt",
    )
    .build()?;

    // Load an image
    let image = load_image(Path::new("document.jpg"))?;
    
    // Run prediction
    let results = ocr.predict(vec![image])?;

    // Process results
    for text_region in &results[0].text_regions {
        if let Some((text, confidence)) = text_region.text_with_confidence() {
            println!("Text: {} ({:.2})", text, confidence);
        }
    }

    Ok(())
}

Document Structure Analysis

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize structure analysis pipeline
    let structure = OARStructureBuilder::new("pp-doclayout_plus-l.onnx")
        .with_table_classification("pp-lcnet_x1_0_table_cls.onnx")
        .with_table_structure_recognition("slanet_plus.onnx", "wireless")
        .table_structure_dict_path("table_structure_dict_ch.txt")
        .with_ocr(
            "pp-ocrv5_mobile_det.onnx", 
            "pp-ocrv5_mobile_rec.onnx", 
            "ppocrv5_dict.txt"
        )
        .build()?;
        
    // Analyze document
    let result = structure.predict("document.jpg")?;
    
    // Output Markdown
    println!("{}", result.to_markdown());
    
    Ok(())
}

Vision-Language Models (VLM)

For advanced document understanding using Vision-Language Models (like PaddleOCR-VL, PaddleOCR-VL-1.5, GLM-OCR, HunyuanOCR, and MinerU2.5), check out the oar-ocr-vl crate.

Hierarchical Speculative Decoding (HSD)

oar-ocr-vl ships a training-free CUDA acceleration scheme for the VLM backbones above. A cheap pipeline drafter (layout + OCR) proposes text candidates and the target VLM verifies them in batches via tree-attention, typically delivering several-fold wall-time speedups on document-heavy pages at τ = 0.75. Build with --features hsd (implies cuda); see docs/hsd.md for the algorithm overview, config knobs, supported backbones, and AAL guidance.

Documentation

  • Usage Guide - Detailed API usage, builder patterns, GPU configuration
  • Pre-trained Models - Model download links and recommended configurations
  • HSD - Hierarchical Speculative Decoding for VLM inference

Examples

The examples/ directory contains complete examples for various tasks:

# General OCR
cargo run --example ocr -- --help

# Document Structure Analysis
cargo run --example structure -- --help

# Layout Detection
cargo run --example layout_detection -- --help

# Table Structure Recognition
cargo run --example table_structure_recognition -- --help

Acknowledgments

This project builds upon the excellent work of several open-source projects:

  • ort: Rust bindings for ONNX Runtime by pykeio. This crate provides the Rust interface to ONNX Runtime that powers the efficient inference engine in this OCR library.

  • PaddleOCR: Baidu's awesome multilingual OCR toolkits based on PaddlePaddle. This project utilizes PaddleOCR's pre-trained models, which provide excellent accuracy and performance for text detection and recognition across multiple languages.

  • Candle: A minimalist ML framework for Rust by Hugging Face. We use Candle to implement Vision-Language model inference.