Automatic Label Error Correction Without Human Labor
April 10, 2026 ยท View on GitHub
Doc
https://guotong1988.github.io/core_research/2024/02/01/auto-re-label/
Run
Step-1, Train the model on origin training dataset, train.py
Step-2, Predict the training/dev datasets, predict.py
Step-3, Prepare the candidate training datasets, get_dataset_list.py
Step-4, Find the best dataset by dev accuracy, explore_train.py
Requirement
transformers 4.38.2 or 4.26.1
torch 2.2.1 or 1.11.0
scikit-learn 1.3.2
datasets 2.18.0
accelerate 0.27.2
Experiment Results


Related Work
Label Error Correction With Human Labor: The Re-Label Method For Data-Centric Machine Learning
Using LLMs To Re-Label: A Unified Framework for NLP Tasks by ReLabel Method
More Info
The methods proposed in this project (and its related works) can be applied to all manually annotated (or dataset annotated by LLMs) machine learning / deep learning tasks.
Not only NLP tasks, but can also be efficiently extended to CV(computer vision) tasks, ASR(speech recognition) tasks, TTS(text-to-speech) tasks, and more.