Semi-Supervised Text Classification with Dual Pseudo Supervision

September 21, 2022 · View on GitHub

Dataset

we only sample a small part of data for submission, the complete data can be downloaded through the following link:

Usage

Train the model by 100 labeled data of AGNews dataset:

python main.py --dataset AGNews --num_labeled 100 --num_unlabeled 20000 --batch-size 32 --max_len 64 --teacher_lr 0.0001 

Train the model by 100 labeled data of Yelp dataset:

python main.py --dataset Yelp --num_labeled 100 --num_unlabeled 20000 --batch-size 4 --max_len 256 --threshold 0.95 --temperature 0.5 --drop 0.3

We can change the parameters --num_labeled and --num_unlabeled to achieve the training result that we want.

Monitoring training progress :

tensorboard --logdir results

Requirements

  • python 3.6+
  • torch 1.7+
  • torchvision 0.8+
  • tensorboard
  • wandb
  • numpy
  • tqdm
  • sklearn
  • Transformers