JointBERT

June 21, 2021 ยท View on GitHub

(Unofficial) Pytorch implementation of JointBERT: BERT for Joint Intent Classification and Slot Filling

Model Architecture

  • Predict intent and slot at the same time from one BERT model (=Joint model)
  • total_loss = intent_loss + coef * slot_loss (Change coef with --slot_loss_coef option)
  • If you want to use CRF layer, give --use_crf option

Dependencies

  • python>=3.6
  • torch==1.6.0
  • transformers==3.0.2
  • seqeval==0.0.12
  • pytorch-crf==0.7.2

Dataset

TrainDevTestIntent LabelsSlot Labels
ATIS4,47850089321120
Snips13,084700700772
  • The number of labels are based on the train dataset.
  • Add UNK for labels (For intent and slot labels which are only shown in dev and test dataset)
  • Add PAD for slot label

Training & Evaluation

$ python3 main.py --task {task_name} \
                  --model_type {model_type} \
                  --model_dir {model_dir_name} \
                  --do_train --do_eval \
                  --use_crf

# For ATIS
$ python3 main.py --task atis \
                  --model_type bert \
                  --model_dir atis_model \
                  --do_train --do_eval
# For Snips
$ python3 main.py --task snips \
                  --model_type bert \
                  --model_dir snips_model \
                  --do_train --do_eval

Prediction

$ python3 predict.py --input_file {INPUT_FILE_PATH} --output_file {OUTPUT_FILE_PATH} --model_dir {SAVED_CKPT_PATH}

Results

  • Run 5 ~ 10 epochs (Record the best result)
  • Only test with uncased model
  • ALBERT xxlarge sometimes can't converge well for slot prediction.
Intent acc (%)Slot F1 (%)Sentence acc (%)
SnipsBERT99.1496.9093.00
BERT + CRF98.5797.2493.57
DistilBERT98.0096.1091.00
DistilBERT + CRF98.5796.4691.85
ALBERT98.4397.1693.29
ALBERT + CRF99.0096.5592.57
ATISBERT97.8795.5988.24
BERT + CRF97.9895.9388.58
DistilBERT97.7695.5087.68
DistilBERT + CRF97.6595.8988.24
ALBERT97.6495.7888.13
ALBERT + CRF97.4296.3288.69

Updates

  • 2019/12/03: Add DistilBert and RoBERTa result
  • 2019/12/14: Add Albert (large v1) result
  • 2019/12/22: Available to predict sentences
  • 2019/12/26: Add Albert (xxlarge v1) result
  • 2019/12/29: Add CRF option
  • 2019/12/30: Available to check sentence-level semantic frame accuracy
  • 2020/01/23: Only show the result related with uncased model
  • 2020/04/03: Update with new prediction code

References