Automatic Label Error Correction Without Human Labor

April 10, 2026 ยท View on GitHub

LICENSE 996.icu

Doc

https://guotong1988.github.io/core_research/2024/02/01/auto-re-label/

Run

Step-1, Train the model on origin training dataset, train.py

Step-2, Predict the training/dev datasets, predict.py

Step-3, Prepare the candidate training datasets, get_dataset_list.py

Step-4, Find the best dataset by dev accuracy, explore_train.py

Requirement

transformers 4.38.2 or 4.26.1

torch 2.2.1 or 1.11.0

scikit-learn 1.3.2

datasets 2.18.0

accelerate 0.27.2

Experiment Results

table1

table1

Related Work

Label Error Correction With Human Labor: The Re-Label Method For Data-Centric Machine Learning

Using LLMs To Re-Label: A Unified Framework for NLP Tasks by ReLabel Method

More Info

The methods proposed in this project (and its related works) can be applied to all manually annotated (or dataset annotated by LLMs) machine learning / deep learning tasks.

Not only NLP tasks, but can also be efficiently extended to CV(computer vision) tasks, ASR(speech recognition) tasks, TTS(text-to-speech) tasks, and more.