README.md
February 12, 2025 ยท View on GitHub
ALTO Tools
Python tools for performing various operations on ALTO XML files
Installation
You can install from PyPI by running
pip install alto-tools
or clone the repository, enter it and run
pip install .
Usage
alto-tools <INPUT> [OPTION]
INPUT should be the path to an ALTO xml file or directory containing ALTO xml files.
To pipe the output of another command into alto-tools, pass the path - as the INPUT argument, e.g.
cat tests/data/PPN720183197-PHYS_0004.xml | alto-tools -t -
The following OPTIONS are currently supported:
| OPTION | Description |
|---|---|
-t --text | Extract UTF-8 encoded text content |
-c --confidence | Extract mean OCR word confidence score |
-i --illustrations | Extract bounding box coordinates of <Illustration> elements |
-g --graphics | Extract bounding box coordinates of <GraphicalElement> elements |
-s --statistics | Extract statistical info (no. of textlines, words, glyphs etc.) |
All output is sent to stdout.