Publications Using OpenCC

June 2, 2026 · View on GitHub

OpenCC is widely used as a Chinese script conversion and text normalization tool in natural language processing, computational linguistics, machine translation, corpus construction, and language model evaluation.

This page lists selected academic publications that explicitly use, compare against, or cite OpenCC. The list is not exhaustive. Pull requests adding more publications are welcome.

Chinese Script Conversion

These papers directly study Simplified/Traditional Chinese conversion or use OpenCC as a baseline system.

YearPublicationVenueOpenCC usage
2024An Unsupervised Framework for Adaptive Context-aware Simplified-Traditional Chinese ConversionLREC-COLING 2024Compares against OpenCC as a public conversion baseline.
20202kenize: Tying Subword Sequences for Chinese Script ConversionACL 2020Uses OpenCC as an off-the-shelf script conversion baseline.
2017Simplified-Traditional Chinese Conversion and ProofreadingIJCNLP 2017Compares OpenCC with other Simplified/Traditional Chinese conversion systems.

Chinese NLP and Corpus Preprocessing

These papers use OpenCC to normalize Chinese corpora before training, evaluation, or downstream NLP experiments.

YearPublicationVenueOpenCC usage
2024Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and AnalysisWMT 2024Uses OpenCC to convert Traditional Chinese data to Simplified Chinese during benchmark construction.
2022ParaZh-22M: A Large-Scale Chinese Parabank via Machine TranslationCOLING 2022Uses OpenCC to convert Traditional Chinese to Simplified Chinese in the data cleaning pipeline.
2018Analogical Reasoning on Chinese Morphological and Semantic RelationsACL 2018Uses OpenCC to convert Traditional Chinese characters into Simplified Chinese during preprocessing.
2017Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon InductionEMNLP 2017Uses OpenCC in Chinese corpus preprocessing.

Chinese Spelling Correction and Grammatical Error Correction

These papers use OpenCC to normalize Traditional Chinese datasets, especially SIGHAN-style datasets, into Simplified Chinese for Chinese spelling correction or grammatical error correction experiments.

YearPublicationVenueOpenCC usage
2024Uncertainty Guidance for Multimodal Chinese Spelling CorrectionLREC-COLING 2024Uses OpenCC in Chinese spelling correction data processing.
2024Error-Robust Retrieval for Chinese Spelling CheckLREC-COLING 2024Uses OpenCC to preprocess Traditional Chinese data.
2024EdaCSC: Two Easy Data Augmentation Methods for Chinese Spelling CorrectionarXivUses OpenCC to convert SIGHAN Traditional Chinese data into Simplified Chinese.
2023General and Domain-adaptive Chinese Spelling Check with Error Consistent PretrainingACM TOISUses OpenCC to convert Traditional Chinese datasets into Simplified Chinese.
2021Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell CheckingFindings of ACL-IJCNLP 2021Uses OpenCC to convert Traditional Chinese SIGHAN data into Simplified Chinese.
2020Heterogeneous Recycle Generation for Chinese Grammatical Error CorrectionCOLING 2020Includes OpenCC in the preprocessing pipeline for Chinese grammatical error correction.

Machine Translation, Low-Resource Chinese Varieties, and Multilingual Processing

These papers use OpenCC in machine translation, Cantonese/Wu Chinese processing, or cross-script normalization.

YearPublicationVenueOpenCC usage
2024Leveraging Mandarin as a Pivot Language for Low-Resource Machine Translation between Cantonese and EnglishLoResMT 2024Uses OpenCC to convert Simplified Chinese data to Traditional Chinese for transfer learning.
2023Cantonese to Written Chinese Translation via HuggingFace Translation PipelineNLPIR 2023Uses OpenCC to convert Mandarin text into Traditional Chinese.
2020Korean-to-Japanese Neural Machine Translation System using Hanja InformationWAT 2020Uses OpenCC in Hanja/Kanji-related text processing.

Language Model Evaluation and Benchmarks

These papers use OpenCC to construct or normalize benchmark data for evaluating language models.

YearPublicationVenueOpenCC usage
2024Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New LanguagesACL 2024Uses OpenCC to convert Simplified Chinese benchmark data into Traditional Chinese.
2024TMMLU+: An Improved Traditional Chinese Evaluation Suite for LLMsOpenReview / arXivUses OpenCC to convert TMMLU+ questions and prompts between Traditional and Simplified Chinese.

Adding Publications

To add a publication, please include:

  • title
  • authors, if available
  • year and venue
  • stable link, preferably DOI, ACL Anthology, ACM, arXiv, OpenReview, or publisher page
  • a short note describing how OpenCC is used

Suggested format:

| YEAR | [TITLE](URL) | VENUE | Uses OpenCC to ... |

Citing OpenCC

If you use OpenCC in academic work, please cite the project repository:

@misc{opencc,
  title        = {OpenCC: Open Chinese Convert},
  author       = {Kuo, Carbo and contributors},
  howpublished = {\url{https://github.com/BYVoid/OpenCC}},
  note         = {Accessed: YYYY-MM-DD}
}