README.md

March 26, 2025 Β· View on GitHub

πŸ““RoDLA

Benchmarking the Robustness of Document Layout Analysis Models (CVPR'24)

🏑 Project Homepage

This is the official repository for our CVPR 2024 paper RoDLA:Benchmarking the Robustness of Document Layout Analysis Models. For more result and benchmarking details, please visit our project homepage.

πŸ”Ž Introduction

We introduce RoDLA that aims to benchmark the robustness of Document Layout Analysis (DLA) models. RoDLA is a large-scale benchmark that contains 450,000+ documents with diverse layouts and contents. We also provide a set of evaluation metrics to facilitate the comparison of different DLA models. We hope that RoDLA can serve as a standard benchmark for the robustness evaluation of DLA models.

πŸ“ Catalog

  • Perturbation Benchmark Dataset
    • PubLayNet-P
    • DocLayNet-P
    • M6Doc-P
  • Perturbation Generation and Evaluation Code
  • RoDLA Model Checkpoints
  • RoDLA Model Training Code
  • RoDLA Model Evaluation Code

πŸ“¦ Installation

1. Clone the repository

git clone https://github.com/yufanchen96/RoDLA.git
cd RoDLA

2. Create a conda virtual environment

# create virtual environment
conda create -n RoDLA python=3.7 -y
conda activate RoDLA

3. Install benchmark dependencies

  • Install Basic Dependencies
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
pip install -U openmim
mim install mmcv-full==1.5.0
pip install timm==0.6.11 mmdet==2.28.1
pip install Pillow==9.5.0
pip install opencv-python termcolor yacs pyyaml scipy
  • Install ocrodeg Dependencies
git clone https://github.com/NVlabs/ocrodeg.git
cd ./ocrodeg
pip install -e .
  • Compile CUDA operators
cd ./model/ops_dcnv3
sh ./make.sh
python test.py
  • You can also install the operator using .whl files

    DCNv3-1.0-whl

πŸ“‚ Dataset Preparation

RoDLA Benchmark Dataset Preparation

Download the RoDLA dataset from Google Driver to the desired root directory.

Self-generated Perturbation Dataset Preparation

Prepare the dataset as follows by yourself:

cd ./perturbation

python apply_perturbation.py \
      --dataset_dir ./publaynet/val \
      --json_dir ./publaynet/val.json \
      --dataset_name PubLayNet-P \
      --output_dir ./PubLayNet-P \
      --pert_method all \
      --background_folder ./background \
      --metric all

Dataset Structure

After dataset preparation, the perturbed dataset structure would be:

.desired_root
└── PubLayNet-P
    β”œβ”€β”€ Background
    β”‚   β”œβ”€β”€ Background_1
    β”‚   β”‚   β”œβ”€β”€ psnr.json
    β”‚   β”‚   β”œβ”€β”€ ms_ssim.json
    β”‚   β”‚   β”œβ”€β”€ cw_ssim.json
    β”‚   β”‚   β”œβ”€β”€ val.json  
    β”‚   β”‚   β”œβ”€β”€ val
    β”‚   β”‚   β”‚   β”œβ”€β”€ PMC538274_00004.jpg
    ...
    β”‚   β”œβ”€β”€ Background_2
    ...
    β”œβ”€β”€ Rotation
    ...

πŸš€ Quick Start

Download the RoDLA model checkpoints

Evaluate the RoDLA model

cd ./model
python -u test.py configs/publaynet/rodla_internimage_xl_publaynet.py \
  checkpoint_dir/rodla_internimage_xl_publaynet.pth \
  --work-dir result/rodla_internimage_publaynet/Speckle_1 \
  --eval bbox \
  --cfg-options data.test.ann_file='PubLayNet-P/Speckle/Speckle_1/val.json' \
                data.test.img_prefix='PubLayNet-P/Speckle/Speckle_1/val/'

Training the RoDLA model

  • Modify the configuration file under configs/_base_/datasets to specify the dataset path
  • Run the following command to train the model with 4 GPUs
sh dist_train.sh configs/publaynet/rodla_internimage_xl_2x_publaynet.py 4

🌳 Citation

If you find this code useful for your research, please consider citing:

@inproceedings{chen2024rodla,
      title={RoDLA: Benchmarking the Robustness of Document Layout Analysis Models}, 
      author={Yufan Chen and Jiaming Zhang and Kunyu Peng and Junwei Zheng and Ruiping Liu and Philip Torr and Rainer Stiefelhagen},
      booktitle={CVPR},
      year={2024}
}