Analyzing ASR Representations
February 12, 2018 ยท View on GitHub
This repository contains code for our paper on analyzing speech representations in end-to-end automatic speech recognition models:
"Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems", Yonatan Belinkov and James Glass, NIPS 2017.
Requirements
Instructions
- First prepare a dataset in LMDB format according to the instructions in deepspeech.torch. We provide a custom
MakeLMDBTimes.luafile to process a dataset with time segmentation such as TIMIT. - Run
train.luawith the following arguments:
loadPath: DeepSpeech-2 model trained with deepspeech.torchtrainingSetLMDBPath,validationSetLMDBPath,testSetLMDBPath: top folders for the LMDB training/validation/test setsreprLayer: representation layer name (input, cnn1, cnn2, rnn1, rnn2, etc.)predFile: file to save predictions
See train.lua for more options, such as controlling convolution strides, using a window of features around the frame or predicting phone classes.
Citing
If you use this code, please consider citing our paper:
@InProceedings{belinkov:2017:nips,
author = {Belinkov, Yonatan and Glass, James},
title = {Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
month = {December},
year = {2017}
}
Acknowledgements
This project uses code from deepspeech.torch.