Publications Using OpenCC
June 2, 2026 · View on GitHub
OpenCC is widely used as a Chinese script conversion and text normalization tool in natural language processing, computational linguistics, machine translation, corpus construction, and language model evaluation.
This page lists selected academic publications that explicitly use, compare against, or cite OpenCC. The list is not exhaustive. Pull requests adding more publications are welcome.
Chinese Script Conversion
These papers directly study Simplified/Traditional Chinese conversion or use OpenCC as a baseline system.
| Year | Publication | Venue | OpenCC usage |
|---|---|---|---|
| 2024 | An Unsupervised Framework for Adaptive Context-aware Simplified-Traditional Chinese Conversion | LREC-COLING 2024 | Compares against OpenCC as a public conversion baseline. |
| 2020 | 2kenize: Tying Subword Sequences for Chinese Script Conversion | ACL 2020 | Uses OpenCC as an off-the-shelf script conversion baseline. |
| 2017 | Simplified-Traditional Chinese Conversion and Proofreading | IJCNLP 2017 | Compares OpenCC with other Simplified/Traditional Chinese conversion systems. |
Chinese NLP and Corpus Preprocessing
These papers use OpenCC to normalize Chinese corpora before training, evaluation, or downstream NLP experiments.
| Year | Publication | Venue | OpenCC usage |
|---|---|---|---|
| 2024 | Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis | WMT 2024 | Uses OpenCC to convert Traditional Chinese data to Simplified Chinese during benchmark construction. |
| 2022 | ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation | COLING 2022 | Uses OpenCC to convert Traditional Chinese to Simplified Chinese in the data cleaning pipeline. |
| 2018 | Analogical Reasoning on Chinese Morphological and Semantic Relations | ACL 2018 | Uses OpenCC to convert Traditional Chinese characters into Simplified Chinese during preprocessing. |
| 2017 | Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction | EMNLP 2017 | Uses OpenCC in Chinese corpus preprocessing. |
Chinese Spelling Correction and Grammatical Error Correction
These papers use OpenCC to normalize Traditional Chinese datasets, especially SIGHAN-style datasets, into Simplified Chinese for Chinese spelling correction or grammatical error correction experiments.
| Year | Publication | Venue | OpenCC usage |
|---|---|---|---|
| 2024 | Uncertainty Guidance for Multimodal Chinese Spelling Correction | LREC-COLING 2024 | Uses OpenCC in Chinese spelling correction data processing. |
| 2024 | Error-Robust Retrieval for Chinese Spelling Check | LREC-COLING 2024 | Uses OpenCC to preprocess Traditional Chinese data. |
| 2024 | EdaCSC: Two Easy Data Augmentation Methods for Chinese Spelling Correction | arXiv | Uses OpenCC to convert SIGHAN Traditional Chinese data into Simplified Chinese. |
| 2023 | General and Domain-adaptive Chinese Spelling Check with Error Consistent Pretraining | ACM TOIS | Uses OpenCC to convert Traditional Chinese datasets into Simplified Chinese. |
| 2021 | Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking | Findings of ACL-IJCNLP 2021 | Uses OpenCC to convert Traditional Chinese SIGHAN data into Simplified Chinese. |
| 2020 | Heterogeneous Recycle Generation for Chinese Grammatical Error Correction | COLING 2020 | Includes OpenCC in the preprocessing pipeline for Chinese grammatical error correction. |
Machine Translation, Low-Resource Chinese Varieties, and Multilingual Processing
These papers use OpenCC in machine translation, Cantonese/Wu Chinese processing, or cross-script normalization.
| Year | Publication | Venue | OpenCC usage |
|---|---|---|---|
| 2024 | Leveraging Mandarin as a Pivot Language for Low-Resource Machine Translation between Cantonese and English | LoResMT 2024 | Uses OpenCC to convert Simplified Chinese data to Traditional Chinese for transfer learning. |
| 2023 | Cantonese to Written Chinese Translation via HuggingFace Translation Pipeline | NLPIR 2023 | Uses OpenCC to convert Mandarin text into Traditional Chinese. |
| 2020 | Korean-to-Japanese Neural Machine Translation System using Hanja Information | WAT 2020 | Uses OpenCC in Hanja/Kanji-related text processing. |
Language Model Evaluation and Benchmarks
These papers use OpenCC to construct or normalize benchmark data for evaluating language models.
| Year | Publication | Venue | OpenCC usage |
|---|---|---|---|
| 2024 | Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages | ACL 2024 | Uses OpenCC to convert Simplified Chinese benchmark data into Traditional Chinese. |
| 2024 | TMMLU+: An Improved Traditional Chinese Evaluation Suite for LLMs | OpenReview / arXiv | Uses OpenCC to convert TMMLU+ questions and prompts between Traditional and Simplified Chinese. |
Adding Publications
To add a publication, please include:
- title
- authors, if available
- year and venue
- stable link, preferably DOI, ACL Anthology, ACM, arXiv, OpenReview, or publisher page
- a short note describing how OpenCC is used
Suggested format:
| YEAR | [TITLE](URL) | VENUE | Uses OpenCC to ... |
Citing OpenCC
If you use OpenCC in academic work, please cite the project repository:
@misc{opencc,
title = {OpenCC: Open Chinese Convert},
author = {Kuo, Carbo and contributors},
howpublished = {\url{https://github.com/BYVoid/OpenCC}},
note = {Accessed: YYYY-MM-DD}
}