awesome-sentence-embedding [](https://github.com/sindresorhus/awesome)

December 9, 2020 · View on GitHub

Build Status GitHub - LICENSE

A curated list of pretrained sentence and word embedding models

Table of Contents

About This Repo

  • well there are some awesome-lists for word embeddings and sentence embeddings, but all of them are outdated and more importantly incomplete
  • this repo will also be incomplete, but I'll try my best to find and include all the papers with pretrained models
  • this is not a typical awesome list because it has tables but I guess it's ok and much better than just a huge list
  • if you find any mistakes or find another paper or anything please send a pull request and help me to keep this list up to date
  • enjoy!

General Framework

  • Almost all the sentence embeddings work like this:
  • Given some sort of word embeddings and an optional encoder (for example an LSTM) they obtain the contextualized word embeddings.
  • Then they define some sort of pooling (it can be as simple as last pooling).
  • Based on that they either use it directly for the supervised classification task (like infersent) or generate the target sequence (like skip-thought).
  • So, in general, we have many sentence embeddings that you have never heard of, you can simply do mean-pooling over any word embedding and it's a sentence embedding!

Word Embeddings

  • Note: don't worry about the language of the code, you can almost always (except for the subword models) just use the pretrained embedding table in the framework of your choice and ignore the training code
datepapercitation counttraining codepretrained models
-WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic ModelsN/A-RusVectōrēs
2013/01Efficient Estimation of Word Representations in Vector Space999+C Word2Vec
2014/12Word Representations via Gaussian Embedding221Cython -
2014/??A Probabilistic Model for Learning Multi-Prototype Word Embeddings127DMTK -
2014/??Dependency-Based Word Embeddings719C++word2vecf
2014/??GloVe: Global Vectors for Word Representation999+C GloVe
2015/06Sparse Overcomplete Word Vector Representations129C++ -
2015/06From Paraphrase Database to Compositional Paraphrase Model and Back3Theano PARAGRAM
2015/06Non-distributional Word Vector Representations68Python WordFeat
2015/??Joint Learning of Character and Word Embeddings195C -
2015/??SensEmbed: Learning Sense Embeddings for Word and Relational Similarity249-SensEmbed
2015/??Topical Word Embeddings292Cython
2016/02Swivel: Improving Embeddings by Noticing What's Missing61TF -
2016/03Counter-fitting Word Vectors to Linguistic Constraints232Python counter-fitting(broken)
2016/05Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec91Chainer -
2016/06Siamese CBOW: Optimizing Word Embeddings for Sentence Representations166TheanoSiamese CBOW
2016/06Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations58Go lexvec
2016/07Enriching Word Vectors with Subword Information999+C++ fastText
2016/08Morphological Priors for Probabilistic Neural Word Embeddings34Theano -
2016/11A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks359C++ charNgram2vec
2016/12ConceptNet 5.5: An Open Multilingual Graph of General Knowledge604Python Numberbatch
2016/??Learning Word Meta-Embeddings58-Meta-Emb(broken)
2017/02Offline bilingual word vectors, orthogonal transformations and the inverted softmax336Python -
2017/04Multimodal Word Distributions57TF word2gm
2017/05Poincaré Embeddings for Learning Hierarchical Representations413Pytorch -
2017/06Context encoders as a simple but powerful extension of word2vec13Python -
2017/06Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints99TF Attract-Repel
2017/08Learning Chinese Word Representations From Glyphs Of Characters44C -
2017/08Making Sense of Word Embeddings92Python sensegram
2017/09Hash Embeddings for Efficient Word Representations25Keras -
2017/10BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages91Gensim BPEmb
2017/11SPINE: SParse Interpretable Neural Embeddings48Pytorch SPINE
2017/??AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP161Gensim AraVec
2017/??Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics25C -
2017/??Dict2vec : Learning Word Embeddings using Lexical Dictionaries49C++ Dict2vec
2017/??Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components63C -
2018/04Representation Tradeoffs for Hyperbolic Embeddings120Pytorch h-MDS
2018/04Dynamic Meta-Embeddings for Improved Sentence Representations60Pytorch DME/CDME
2018/05Analogical Reasoning on Chinese Morphological and Semantic Relations128-ChineseWordVectors
2018/06Probabilistic FastText for Multi-Sense Word Embeddings39C++ Probabilistic FastText
2018/09Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks3TF SynGCN
2018/09FRAGE: Frequency-Agnostic Word Representation64Pytorch -
2018/12Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia17Cython Wikipedia2Vec
2018/??Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings106-ChineseEmbedding
2018/??cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information45C++ -
2019/02VCWE: Visual Character-Enhanced Word Embeddings5Pytorch VCWE
2019/05Learning Cross-lingual Embeddings from Twitter via Distant Supervision2Text -
2019/08An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning5TF -
2019/08ViCo: Word Embeddings from Visual Co-occurrences7Pytorch ViCo
2019/11Spherical Text Embedding25C -
2019/??Unsupervised word embeddings capture latent knowledge from materials science literature150Gensim -

OOV Handling

Contextualized Word Embeddings

  • Note: all the unofficial models can load the official pretrained models
datepapercitation countcodepretrained models
-Language Models are Unsupervised Multitask LearnersN/ATF
Pytorch, TF2.0
Keras
GPT-2(117M, 124M, 345M, 355M, 774M, 1558M)
2017/08Learned in Translation: Contextualized Word Vectors524Pytorch
Keras
CoVe
2018/01Universal Language Model Fine-tuning for Text Classification167Pytorch ULMFit(English, Zoo)
2018/02Deep contextualized word representations999+Pytorch
TF
ELMO(AllenNLP, TF-Hub)
2018/04Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling26Pytorch LD-Net
2018/07Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation120Pytorch ELMo
2018/08Direct Output Connection for a High-Rank Language Model24Pytorch DOC
2018/10BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding999+TF
Keras
Pytorch, TF2.0
MXNet
PaddlePaddle
TF
Keras
BERT(BERT, ERNIE, KoBERT)
2018/??Contextual String Embeddings for Sequence Labeling486Pytorch Flair
2018/??Improving Language Understanding by Generative Pre-Training999+TF
Keras
Pytorch, TF2.0
GPT
2019/01Multi-Task Deep Neural Networks for Natural Language Understanding364Pytorch MT-DNN
2019/01BioBERT: pre-trained biomedical language representation model for biomedical text mining634TF BioBERT
2019/01Cross-lingual Language Model Pretraining639Pytorch
Pytorch, TF2.0
XLM
2019/01Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context754TF
Pytorch
Pytorch, TF2.0
Transformer-XL
2019/02Efficient Contextual Representation Learning Without Softmax Layer2Pytorch -
2019/03SciBERT: Pretrained Contextualized Embeddings for Scientific Text124Pytorch, TF SciBERT
2019/04Publicly Available Clinical BERT Embeddings229Text clinicalBERT
2019/04ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission84Pytorch ClinicalBERT
2019/05ERNIE: Enhanced Language Representation with Informative Entities210Pytorch ERNIE
2019/05Unified Language Model Pre-training for Natural Language Understanding and Generation278Pytorch UniLMv1(unilm1-large-cased, unilm1-base-cased)
2019/05HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization81-
2019/06Pre-Training with Whole Word Masking for Chinese BERT98Pytorch, TF BERT-wwm
2019/06XLNet: Generalized Autoregressive Pretraining for Language Understanding999+TF
Pytorch, TF2.0
XLNet
2019/07ERNIE 2.0: A Continual Pre-training Framework for Language Understanding107PaddlePaddle ERNIE 2.0
2019/07SpanBERT: Improving Pre-training by Representing and Predicting Spans282Pytorch SpanBERT
2019/07RoBERTa: A Robustly Optimized BERT Pretraining Approach999+Pytorch
Pytorch, TF2.0
RoBERTa
2019/09Subword ELMo1Pytorch -
2019/09Knowledge Enhanced Contextual Word Representations115-
2019/09TinyBERT: Distilling BERT for Natural Language Understanding129-
2019/09Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism136Pytorch Megatron-LM(BERT-345M, GPT-2-345M)
2019/09MultiFiT: Efficient Multi-lingual Language Model Fine-tuning29Pytorch -
2019/09Extreme Language Model Compression with Optimal Subwords and Shared Projections32-
2019/09MULE: Multimodal Universal Language Embedding5-
2019/09Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks51-
2019/09K-BERT: Enabling Language Representation with Knowledge Graph59-
2019/09UNITER: Learning UNiversal Image-TExt Representations60-
2019/09ALBERT: A Lite BERT for Self-supervised Learning of Language Representations803TF -
2019/10BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension349Pytorch BART(bart.base, bart.large, bart.large.mnli, bart.large.cnn, bart.large.xsum)
2019/10DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter481Pytorch, TF2.0 DistilBERT
2019/10Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer696TF T5
2019/11CamemBERT: a Tasty French Language Model102-CamemBERT
2019/11ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations15Pytorch -
2019/11Unsupervised Cross-lingual Representation Learning at Scale319Pytorch XLM-R (XLM-RoBERTa)(xlmr.large, xlmr.base)
2020/01ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training35Pytorch ProphetNet(ProphetNet-large-16GB, ProphetNet-large-160GB)
2020/02CodeBERT: A Pre-Trained Model for Programming and Natural Languages25Pytorch CodeBERT
2020/02UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training33Pytorch -
2020/03ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators203TF ELECTRA(ELECTRA-Small, ELECTRA-Base, ELECTRA-Large)
2020/04MPNet: Masked and Permuted Pre-training for Language Understanding5Pytorch MPNet
2020/05ParsBERT: Transformer-based Model for Persian Language Understanding1Pytorch ParsBERT
2020/05Language Models are Few-Shot Learners382--
2020/07InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training12Pytorch -

Pooling Methods

Encoders

datepapercitation countcodemodel_name
-Incremental Domain Adaptation for Neural Machine Translation in Low-Resource SettingsN/APython AraSIF
2014/05Distributed Representations of Sentences and Documents999+Pytorch
Python
Doc2Vec
2014/11Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models849Theano
Pytorch
VSE
2015/06Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books795Theano
TF
Pytorch, Torch
SkipThought
2015/11Order-Embeddings of Images and Language354Theano order-embedding
2015/11Towards Universal Paraphrastic Sentence Embeddings411Theano ParagramPhrase
2015/??From Word Embeddings to Document Distances999+C, Python Word Mover's Distance
2016/02Learning Distributed Representations of Sentences from Unlabelled Data363Python FastSent
2016/07Charagram: Embedding Words and Sentences via Character n-grams144Theano Charagram
2016/11Learning Generic Sentence Representations Using Convolutional Neural Networks76Theano ConvSent
2017/03Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features319C++ Sent2Vec
2017/04Learning to Generate Reviews and Discovering Sentiment293TF
Pytorch
Pytorch
Sentiment Neuron
2017/05Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings60Theano GRAN
2017/05Supervised Learning of Universal Sentence Representations from Natural Language Inference Data999+Pytorch InferSent
2017/07VSE++: Improving Visual-Semantic Embeddings with Hard Negatives132Pytorch VSE++
2017/08Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm357Keras
Pytorch
DeepMoji
2017/09StarSpace: Embed All The Things!129C++ StarSpace
2017/10DisSent: Learning Sentence Representations from Explicit Discourse Relations47Pytorch DisSent
2017/11Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations128Theano para-nmt
2017/11Dual-Path Convolutional Image-Text Embedding with Instance Loss44Matlab Image-Text-Embedding
2018/03An efficient framework for learning sentence representations183TF Quick-Thought
2018/03Universal Sentence Encoder564TF-HubUSE
2018/04End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions14Theano DEISTE
2018/04Learning general purpose distributed sentence representations via large scale multi-task learning198Pytorch GenSen
2018/06Embedding Text in Hyperbolic Spaces50TF HyperText
2018/07Representation Learning with Contrastive Predictive Coding736Keras CPC
2018/08Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations8Python CMD
2018/09Learning Universal Sentence Representations with Mean-Max Attention Autoencoder14TF Mean-MaxAAE
2018/10Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model35TF-HubUSE-xling
2018/10Improving Sentence Representations with Consensus Maximisation4-Multi-view
2018/10BioSentVec: creating sentence embeddings for biomedical texts70Python BioSentVec
2018/11Word Mover's Embedding: From Word2Vec to Document Embedding47C, Python WordMoversEmbeddings
2018/11A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks76Pytorch HMTL
2018/12Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond238Pytorch LASER
2018/??Convolutional Neural Network for Universal Sentence Embeddings6Theano CSE
2019/01No Training Required: Exploring Random Encoders for Sentence Classification54Pytorch randsent
2019/02CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model4Pytorch CMOW
2019/07GLOSS: Generative Latent Optimization of Sentence Representations1-GLOSS
2019/07Multilingual Universal Sentence Encoder52TF-HubMultilingualUSE
2019/08Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks261Pytorch Sentence-BERT
2020/02SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models11Pytorch SBERT-WK
2020/06DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations4Pytorch DeCLUTR
2020/07Language-agnostic BERT Sentence Embedding5TF-HubLaBSE
2020/11On the Sentence Embeddings from Pre-trained Language Models0TF BERT-flow

Evaluation

Misc

Vector Mapping

Articles