TextBPN-MLOCR: Advanced Multi-Lingual Scene Text Detection

July 2, 2025 Β· View on GitHub

License HuggingFace Model PyPI Version

Enhanced version of TextBPN++ for robust scene text detection across multiple languages and artistic fonts. Trained on large-scale synthetic and real-world text datasets for superior performance in diverse scenarios.

News

✨ Key Features

  • Multi-Lingual Support: Detect text in Arabic, Bangla, Chinese, Japanese, Korean, Latin, Hindi
  • Artistic Text Handling: Accurately processes stylized and decorative fonts
  • Optimized Performance: Fully supports modern NVIDIA GPUs
  • Large-scale Training:
    • πŸ§ͺ 1.5M+ synthetic text samples
    • πŸ“Έ 500K+ real-world text samples

πŸ› οΈ Hardware Requirements

ComponentRequirement
GPUNVIDIA GPUs
CUDA12.2
Pythonβ‰₯ 3.9
OSLinux (recommended)

πŸ”½ Model Download

Download pre-trained models from HuggingFace Hub:
https://huggingface.co/somos99/TextBPN-MLOCR

πŸ“¦ Installation

Install via PyPI:

pip install -r requirements.txt

From DCN with CUDA

sh make.sh

πŸš€ Quick Start

import datetime
import json
import logging
import torch
from typing import List
import base64
import cv2
import numpy as np
from PIL import Image
from io import BytesIO
from ocr.ocr_detection import FrameOCR


if __name__ == "__main__":
    # model
    torch.cuda.set_device(0)
    model_path = './models/TextBPN_deformable_resnet50_best2.pth'
    detect_model = FrameOCR(model_path, backbone="deformable_resnet50", use_gpu=True, need_layout=True, test_speed=False)
     
    test_img = "test.jpg"
    raw_images = cv2.imread(test_img)
    if len(raw_images.shape) == 2:
        raw_images = cv2.cvtColor(raw_images, cv2.COLOR_GRAY2BGR)

    out_puts = detect_model.detect([raw_images])
    print(out_puts)

🎨Gradio

πŸ“– References

@inproceedings{zhang2021adaptive,
  title={Adaptive boundary proposal network for arbitrary shape text detection},
  author={Zhang, Shi-Xue and Zhu, Xiaobin and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={1305--1314},
  year={2021}
}

@article{zhang2023arbitrary,
  title={Arbitrary shape text detection via boundary transformer},
  author={Zhang, Shi-Xue and Yang, Chun and Zhu, Xiaobin and Yin, Xu-Cheng},
  journal={IEEE Transactions on Multimedia},
  volume={26},
  pages={1747--1760},
  year={2023},
  publisher={IEEE}
}

βš–οΈ License

This project is licensed under the MIT License.

πŸ™ Acknowledgements

This project extends the original work from:

  • TextBPN++: GitHub Repository
  • Contributors to the TextBPN project

Contribute & Support​​

🌟 Star us on GitHub β†’ https://github.com/somos99/TextBPN-MLOCR
πŸ› Report issues β†’ https://github.com/somos99/TextBPN-MLOCR/issues
πŸ“₯ Pull requests welcome!