HAN-pytorch

January 11, 2018 ยท View on GitHub


Deprecated code

A faster and up to date implementation is in my other repo


HAN-pytorch

Batched implementation of Hierarchical Attention Networks for Document Classification paper

Requirements

  • Pytorch (>= 0.2)
  • Spacy (for tokenizing)
  • Gensim (for building word vectors)
  • tqdm (for fancy graphics)

Scripts:

  • prepare_data.py transforms gzip files as found on Julian McAuley Amazon product data page to lists of (user,item,review,rating) tuples and builds word vectors if --create-emb option is specified.
  • main.py trains a Hierarchical Model.
  • Data.py holds data managing objects.
  • Nets.py holds networks.
  • beer2json.py is an helper script if you happen to have the ratebeer/beeradvocate datasets.

Note:

The whole dataset is used to create word embeddings which can be an issue.