Transformer-Encoder-with-Char

May 4, 2019 · View on GitHub

Transformer Encoder with Char information for text classification
This code was created by referring to the code in carpedm20 and DongjunLee

1. Model structure

alt text

Input words are represented with Char-CNN, Word2vec concatenated together(64 dimensions each)
Normal Transformer Encoder from (Attention is all you need) is used
Model is composed of 7 Transformer Encoder layers with 4 attention heads
Global Average Pooling layer with softmax is used at the end, for predicting class

2. Char CNN

alt text

Char CNN implemented by Yoon Kim

3. Prerequisite

Tensorflow 1.8.0
Python 3.6

4. Training

Clone git

$ git clone https://github.com/MSWon/Transformer-Encoder-with-Char.git

Unzip data.zip and embedding.zip

$ unzip data.zip
$ unzip embedding.zip

Training with user settings (char_mode : (char_cnn, char_lstm, no_char))

$ python train.py --batch_size 128 --training_epochs 12 --char_mode char_cnn

5. Experiments

5-1. Datasets

The AG’s news topic classification dataset is constructed by choosing 4 largest classes from the original news corpus
4 classes are ‘world’, ‘sports’, ‘business’ and ‘science/technology’
Each class contains 30,000 training samples and 1,900 testing samples
The total number of training samples is 120,000 and 7,600 for test

5-2. Test loss graph

alt text

5-3. Performance table

alt text