Transformer-Encoder-with-Char
May 4, 2019 · View on GitHub
- Transformer Encoder with Char information for text classification
- This code was created by referring to the code in carpedm20 and DongjunLee
1. Model structure

-
Input words are represented with Char-CNN, Word2vec concatenated together(64 dimensions each)
-
Normal Transformer Encoder from (Attention is all you need) is used
-
Model is composed of 7 Transformer Encoder layers with 4 attention heads
-
Global Average Pooling layer with softmax is used at the end, for predicting class
2. Char CNN
- Char CNN implemented by Yoon Kim
3. Prerequisite
- Tensorflow 1.8.0
- Python 3.6
4. Training
- Clone git
$ git clone https://github.com/MSWon/Transformer-Encoder-with-Char.git
- Unzip data.zip and embedding.zip
$ unzip data.zip
$ unzip embedding.zip
- Training with user settings (char_mode : (char_cnn, char_lstm, no_char))
$ python train.py --batch_size 128 --training_epochs 12 --char_mode char_cnn
5. Experiments
5-1. Datasets
- The AG’s news topic classification dataset is constructed by choosing 4 largest classes from the original news corpus
- 4 classes are ‘world’, ‘sports’, ‘business’ and ‘science/technology’
- Each class contains 30,000 training samples and 1,900 testing samples
- The total number of training samples is 120,000 and 7,600 for test
5-2. Test loss graph

5-3. Performance table
