ColDBin: Cold Diffusion for Document Image Binarization

April 30, 2023 ยท View on GitHub

This repository contains the datasets and code for the paper ColDBin: Cold Diffusion for Document Image Binarization by Saifullah Saifullah, Stefan Agne, Andreas Dengel, and Sheraz Ahmed.

Requires Python 3+. For evaluation, please download the data from the links below.

Approach:

Qualitative Results:

Quantitative Results

DatasetFMp-FMPSNRDRD
DIBCO 200994.1996.5220.652.58
DIBCO 201095.2996.6722.061.36
DIBCO 201195.2396.9321.531.44
DIBCO 201296.3797.4123.401.28
DIBCO 201396.6297.1523.981.20
DIBCO 201497.8998.1024.380.66
DIBCO 201689.5093.7318.713.84
DIBCO 201793.0495.1219.322.29
DIBCO 201889.7193.0019.533.82

Prepare dibco datasets

Download the datasets from the link: Use the example dataset preparation script provided for DIBCO 2013 dataset:

./scripts/prepare_dataset.sh

Train

Train a diffusion model in cold manner using the example training script for DIBCO 2013 dataset:

./scripts/train.sh

Test:

Test the trained model using the example testing script for DIBCO 2013 dataset:

./scripts/test.sh

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.