mantis

November 13, 2016 · View on GitHub

Deep learning models of machine translation using attention and structural bias. This is build on top of the cnn neural network library, using C++. Please refer to the cnn github page for more details, including some issues with compiling and running with the library.

This code is an implementation of the following paper:

Incorporating Structural Alignment Biases into an Attentional Neural Translation Model. 
Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer and Gholamreza Haffari. 
In Proceedings of NAACL-16, 2016.

Please cite the above paper if you use or extend this code.

Dependencies

Before compiling cnn, you need:

Eigen, using the development version (not release), e.g. 3.3.beta2
cuda version 7.5 or higher
boost, e.g., 1.58 using libboost-all-dev ubuntu package
cmake, e.g., 3.5.1 using cmake ubuntu package

Building

First, clone the repository

git clone https://github.com/trevorcohn/mantis.git

Next pull down the submodules (cnn)

cd mantis
git submodule init 
git submodule update

As mentioned above, you'll need the latest development version of eigen

hg clone https://bitbucket.org/eigen/eigen/

CPU build

Compiling to execute on a CPU is as follows

mkdir build_cpu
cd build_cpu
cmake .. -DEIGEN3_INCLUDE_DIR=eigen
make -j 2

MKL support. If you have Intel's MKL library installed on your machine, you can speed up the computation on the CPU by:

cmake .. -DEIGEN3_INCLUDE_DIR=EIGEN -DMKL=TRUE -DMKL_ROOT=MKL

substituting in different paths to EIGEN and MKL if you have placed them in different directories.

This will build the two binaries

build_cpu/src/attentional
build_cpu/src/biattentional

GPU build

Building on the GPU uses the Nvidia CUDA library, currently tested against version 7.5. The process is as follows

mkdir build_gpu
cd build_gpu
cmake .. -DBACKEND=cuda -DEIGEN3_INCLUDE_DIR=EIGEN -DCUDA_TOOLKIT_ROOT_DIR=CUDA
make -j 2

substituting in your Eigen and CUDA folders, as appropriate.

This will result in the two binaries

build_gpu/src/attentional
build_gpu/src/biattentional

Using the model

The model can be run as follows

./build_cpu/src/attentional -t sample-data/train.de-en.unk.cap -d sample-data/dev.de-en.unk.cap

which will train a small model on a tiny training set, i.e.,

(CPU)
[cnn] random seed: 978201625
[cnn] allocating memory: 512MB
[cnn] memory allocation done.
Reading training data from sample-data/train.de-en.unk.cap...
5000 lines, 117998 & 105167 tokens (s & t), 2738 & 2326 types
Reading dev data from sample-data/dev.de-en.unk.cap...
100 lines, 1800 & 1840 tokens (s & t), 2738 & 2326 types
Parameters will be written to: am_1_64_32_RNN_b0_g000_d0-pid48778.params
%% Using RNN recurrent units
**SHUFFLE
[epoch=0 eta=0.1 clips=50 updates=50]  E = 5.77713 ppl=322.832 [completed in 192.254 ms]
[epoch=0 eta=0.1 clips=50 updates=50]  E = 5.12047 ppl=167.415 [completed in 188.866 ms]
[epoch=0 eta=0.1 clips=50 updates=50]  E = 5.36808 ppl=214.451 [completed in 153.08 ms]
...

(GPU)
[cnn] initializing CUDA
Request for 1 GPU ...
[cnn] Device Number: 0
[cnn]   Device name: GeForce GTX TITAN X
[cnn]   Memory Clock Rate (KHz): 3505000
[cnn]   Memory Bus Width (bits): 384
[cnn]   Peak Memory Bandwidth (GB/s): 336.48
[cnn]   Memory Free (GB): 0.0185508/12.8847
[cnn]
[cnn] Device Number: 1
[cnn]   Device name: GeForce GTX TITAN X
[cnn]   Memory Clock Rate (KHz): 3505000
[cnn]   Memory Bus Width (bits): 384
[cnn]   Peak Memory Bandwidth (GB/s): 336.48
[cnn]   Memory Free (GB): 6.31144/12.8847
[cnn]
[cnn] Device Number: 2
[cnn]   Device name: GeForce GTX TITAN X
[cnn]   Memory Clock Rate (KHz): 3505000
[cnn]   Memory Bus Width (bits): 384
[cnn]   Peak Memory Bandwidth (GB/s): 336.48
[cnn]   Memory Free (GB): 0.0185508/12.8847
[cnn] ...
[cnn] Device(s) selected: 6
[cnn] random seed: 2080175584
[cnn] allocating memory: 512MB
[cnn] memory allocation done.
Reading training data from sample-data/train.de-en.unk.cap...
5000 lines, 117998 & 105167 tokens (s & t), 2738 & 2326 types
Reading dev data from sample-data/dev.de-en.unk.cap...
100 lines, 1800 & 1840 tokens (s & t), 2738 & 2326 types
Parameters will be written to: am_1_64_32_RNN_b0_g000_d0-pid14453.params
%% Using RNN recurrent units
**SHUFFLE
[epoch=0 eta=0.01 clips=0 updates=50]  E = 6.12625 ppl=457.718 [completed in 724.351 ms]
[epoch=0 eta=0.01 clips=0 updates=50]  E = 5.23731 ppl=188.163 [completed in 714.797 ms]
[epoch=0 eta=0.01 clips=0 updates=50]  E = 5.37111 ppl=215.102 [completed in 796.774 ms]
...

Every so often the development performance is measured, and the best scoring model will be saved to disk.

If you want to build a large network, you will need to indicate the memory usage (--cnn-mem FORWARD_MEM,BACKWARD_MEM,PARAMETERS_MEM) for cnn backend, e.g.,

./build_cpu/src/attentional --cnn-mem 3000 -t sample-data/train.de-en.unk.cap -d sample-data/dev.de-en.unk.cap

./build_cpu/src/attentional --cnn-mem 1000,1000,2000 -t sample-data/train.de-en.unk.cap -d sample-data/dev.de-en.unk.cap

The binaries have command line help, and their usage is illustrated in the scripts/ folder. This includes decoding.

Contacts

Trevor Cohn, Hoang Cong Duy Vu and Reza Haffari

Updated October 2016