Semi-Supervised Text Classification with Dual Pseudo Supervision

September 21, 2022 · View on GitHub

Dataset

we only sample a small part of data for submission, the complete data can be downloaded through the following link：

Train the model by 100 labeled data of AGNews dataset:

python main.py --dataset AGNews --num_labeled 100 --num_unlabeled 20000 --batch-size 32 --max_len 64 --teacher_lr 0.0001

Train the model by 100 labeled data of Yelp dataset:

python main.py --dataset Yelp --num_labeled 100 --num_unlabeled 20000 --batch-size 4 --max_len 256 --threshold 0.95 --temperature 0.5 --drop 0.3

We can change the parameters --num_labeled and --num_unlabeled to achieve the training result that we want.

Monitoring training progress :

tensorboard --logdir results