Handwritten Text Synthesis and Recognition

June 22, 2026 · View on GitHub

The pySarah project provides a solution for Handwritten Text Recognition (HTR) using Tensorflow. It includes a tutorial and a set of tools for data processing, model training, testing, and inference. The HTR model can be trained on various datasets and supports different levels of recognition. The project also supports generative and language models that make up the workflow for handwriting synthesis and spelling correction.

The project provides support for MLflow Tracking, which enables better tracking and management of training and testing phases. MLflow allows logging and comparing experiments, tracking metrics, and storing trained models for reproducibility. The MLflow Dashboard can be explored and experiments tracked with mlflow ui.

Getting Started

The steps below describe how to get started with the project.

Requirements

Python >=3.11, <3.13

Installation

Clone the repository:

git clone https://github.com/arthurflor23/handwritten-text-recognition.git

Navigate to the project directory:

cd handwritten-text-recognition

Create and activate the virtual environment:

python3 -m venv .venv

For Linux/Mac:

source .venv/bin/activate

For Windows:

.venv\Scripts\activate

Install requirements

pip install -r requirements.txt

Datasets

The project supports a wide range of datasets for handwritten text recognition. The following datasets are already integrated into the project and can be easily used for training and evaluation.

Fonts

The project also supports font prototypes as input. Folders of .ttf files placed under fonts/ are loaded by the batch generator for training and evaluation. A collection of open fonts is available here.

Parameters

The project has several command-line parameters that can be used to customize its behavior. The list of available parameters is outlined below, along with their descriptions.

Models

--synthesis: Specify synthesis model (e.g., flor).
--recognition: Specify recognition model (e.g., flor).
--segmentation: Specify segmentation model (e.g., flor).
--writer-identification: Specify writer identification model (e.g., flor).
--spelling: Specify spelling model (e.g., openai).

MLflow

--synthesis-run-id: Synthesis model run id or index.
--recognition-run-id: Recognition model run id or index.
--segmentation-run-id: Segmentation model run id or index.
--writer-identification-run-id: Writer identification model run id or index.
--experiment-name: MLflow experiment name.
--finished-runs: Only finished runs for selection.

Dataset

--source: Source data (e.g., iam).
--text-level: Text structure level (e.g., line).
--image-shape: Image dimensions (height, width, channels).
--char-width: Character width for image normalization.
--mask-by-text: Mask data by text length.
--order-by-text: Sort data by text length.
--training-ratio: Training partition ratio.
--validation-ratio: Validation partition ratio.
--test-ratio: Test partition ratio.
--illumination : Apply illumination compensation.
--binarization : Apply binarization method.
--lazy-mode: Activate lazy loading.

Augmentor

--mixup: Mixup transformation (probability, opacity, iterations).
--erode: Erode transformation (probability, kernel size, iterations).
--dilate: Dilate transformation (probability, kernel size, iterations).
--elastic: Elastic transformation (probability, kernel size, alpha).
--perspective: Perspective transformation (probability, alpha).
--shear: Shear transformation (probability, alpha).
--rotate: Rotate transformation (probability, alpha).
--scale: Scale transformation (probability, alpha).
--shift-y: Vertical translation (probability, alpha).
--shift-x: Horizontal translation (probability, alpha).
--salt-and-pepper: Salt and Pepper noise (probability, alpha).
--gaussian-noise: Gaussian noise (probability, alpha).
--gaussian-blur: Gaussian blur filter (probability, kernel size).
--skip-augmentation: Skip data augmentation.

Synthesis

--discriminator-steps: Repetition of steps for discriminator training in synthesis workflow.
--generator-steps: Skipping steps for generator training in synthesis workflow.
--monitor-samples: Number of sample images saved by the training monitor.

Training

--training: Perform training pipeline.
--training-step-factor: Factor for training steps.
--epochs: Maximum number of epochs.
--batch-size: Batch size.
--learning-rate: Learning rate.
--plateau-factor: Learning rate reduction factor.
--plateau-cooldown: Cooldown after plateau.
--plateau-patience: Plateau patience epochs.
--patience: Stop after no improvement.
--synthesis-probability: Training with synthetic data.

Test

--test: Perform test pipeline.
--top-paths: Top paths for prediction.
--beam-width: CTC decoder beam width.

Inference

--inference: Perform inference pipeline.
--image: Image path for recognition.
--bbox: Bounding box (x, y, width, height).
--text: Text for synthesis.

Others

--check: Perform check pipeline.
--input-path: Path to input data.
--fonts-path: Path to fonts data.
--output-path: Path to output data.
--gpu: GPU index or sequence of indices.
--seed: Seed value.
--verbose: Verbosity level.

Usage

The project offers a range of functionalities through command-line parameters, which can be combined to match specific needs. Below are some examples of usage.

Example 1: Train the recognition model across multiple datasets

python sarah --source iam --recognition flor --training --gpu 0;
python sarah --source rimes --recognition flor --training-ratio 0.9 --validation-ratio 0.1 --training --gpu 0;

These commands train the Flor recognition model independently on the IAM and RIMES datasets. For RIMES, the data is split into 90% training and 10% validation.

Example 2: Test the recognition model across multiple datasets

python sarah --source iam --recognition flor --test --recognition-run-id -1;
python sarah --source rimes --recognition flor --test --recognition-run-id -1;

These commands run the test pipeline on IAM and RIMES using the Flor recognition model. The --recognition-run-id -1 flag selects the most recently logged MLflow run for each dataset.

Example 3: Train the synthesis model on the combined dataset

python sarah --source all-in-one --synthesis flor --training-ratio 1.0 --validation-ratio 0.0 --test-ratio 0.0 --training --gpu 0;

This command trains the Flor synthesis model on the combined source, using the full dataset for training with no validation or test partitions.

Tutorial Notebook

A tutorial material is provided to help with getting started. It offers a step-by-step guide to exploring the main pipeline of the project.

The tutorial is designed to be beginner-friendly and can be easily run on Google Colab, a cloud-based Jupyter notebook environment. It provides a hands-on experience of the project's features and demonstrates the usage of various parameters and functionalities.

The tutorial covers:

The project's pipeline.
Setup of required dependencies and environment.
Exploration of different parameters.
Execution of data training and testing pipelines.
Insights applicable to specific context problems.

The material is available in the Tutorial Jupyter Notebook located in the project repository. The notebook instructions describe how to run the code and explore the features.

References

The following references provide additional insights and background information related to Handwritten Text Recognition, and citations are appreciated if any of these works have contributed to related research or projects.

Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. HTR-Flor: A Deep Learning System for Offline Handwritten Text Recognition. 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. HTR-Flor++: A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models. Proceedings of the ACM Symposium on Document Engineering, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Lima, Estanislau B. and Toselli, Alejandro H. HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios. IEEE Access, vol. 8, pp. 208543-208553, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems. Applied Sciences, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. A Robust Handwritten Recognition System for Learning on Different Data Restriction Scenarios. Pattern Recognition Letters, 2022.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Moura, Gabriel C. D. and Toselli, Alejandro H. Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review. SN Computer Science, 2024.
Neto, A. F. S., Bezerra, B. L. D., Araujo, S. S., Souza, W. M. A. S., Alves, K. F., Oliveira, M. F., Lins, S. V. S., Hazin, H. J. F., Rocha, P. H. V., Toselli, A. H.: BRESSAY: A Brazilian Portuguese Dataset for Offline Handwritten Text Recognition. In: 18th International Conference on Document Analysis and Recognition (ICDAR). Springer, Athens, Greece (9 2024).
Neto, A. F. S., Bezerra, B. L. D., Araujo, S. S., Souza, W. M. A. S., Alves, K. F., Oliveira, M. F., Lins, S. V. S., Hazin, H. J. F., Rocha, P. H. V., Toselli, A. H.: ICDAR 2024 Competition on Handwritten Text Recognition in Brazilian Essays – BRESSAY. In: 18th International Conference on Document Analysis and Recognition (ICDAR). Springer, Athens, Greece (9 2024).
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. HTSR-Pollen: Handwritten Text Synthesis and Recognition System to Overcome Data Scarcity. IEEE Access, vol. 14, pp. 54395-54413, 2026.

Additional support for the project's progress is available through Ko-fi, which helps dedicate more time and resources to enhance the project and implement new features.