NLP 101: a Resource Repository for Deep Learning and Natural Language Processing

May 12, 2021 ยท View on GitHub

This document is drafted for those who have enthusiasm for Deep Learning in natural language processing. If there are any good recommendations or suggestions, I will try to add more.

This document is drafted with the rules as follows:

  • Materials that are considered to cover the same grounds will not be recorded repeatedly.
  • Only one among those within similar level of difficulty will be recorded.
  • Materials with different level of difficulty that need prerequsite or additional learning will be recorded.

Language: Korean | English


Mathematics

Statistics and Probabilities

SourceDescription
Statistics 110A lecture on Probability that can be easily understood by non-engineering major students.
Brandon Foltz's StatisticsBrandon Foltz's Probability and Statistics lectures are posted on Youtube and is rather short, so it can be easily accessed during daily commute.

Linear Algebra

SourceDescription
Essence of Linear AlgebraA Linear algebraic lecture on Youtube channel 3Blue1Brown. Could be a big help for those planning to take undergraduate-level linear algebra since it allows overall understanding. It provides intutitively understandable visual aids to getting the picture of Linear algebra.
Linear AlgebraA legendary lecture of professor Gilbert Strang.
Matrix methods in Data Analysis and Machine LearningProfessor Gilbert Strang's lecture on applied Linear algebra. As Linear algbra is prerequisite knowledge here, it is quite difficult to understand yet a great lecture to learn how Linear algebra is actually applied in the field of Machine Learning.

Basic mathematics & Overview

SourceDescription
Essence of calculusA calculus lecture by the channel 3Blue1Brown mentioned above, helpful for those who want an overview of calculus likewise.
CalculusA coursebook on calculus written by professor Gilbert Strang. There is no need to go through the whole book, but chapters 2-4, 11-13, 15-16 are very worth studying.
Mathematics for Machine LearningA book on all the mathematical knowledge accompanied with machine learning. Mathematic knowledge within the collegiate level of natural sciences or engineering is preferable here, as the explanations are mainly broad-brush.

Deep Learning and Natural Language Processing

Deep Learning

SourceDescription
CS230A Deep Learning lecture of the renouned professor Andrew Ng, who has recently founded a startup on AI education.
Deep Learning BookA book written by Ian Goodfellow, the father of GAN, and other renouned professors.
Dive into Deep LearningWhile the 'Deep Learning Book' above has theoretical explanation, this book also includes the codes to check how the notion is actually immplemented.
Grokking Deep LearningTeaches readers how to write basic elements of the neural network with NumPy, without using Deep Learning Frameworks. Also a good material to study how high-level APIs work under the hood.

Natural Language Processing

SourceDescription
Neural Network Methods for NLP An NLP book using Deep Learning written by Yoav Goldberg. It has witty explanations that lead to the fundamentals.
Eisenstein's NLP NoteAwesome book to read that deals with not only NLP with machine learning, but also the basic linguistic knowledge to understand it. Eisenstein's book Introduction to Natural Language Processing was published based on this note.
CS224N Awesome NLP lecture from Stanford. It has the 2019 version, dealing with the latest trends.
CS224U An NLP lecture that was revalued since the advent of GLUE benchmark. Recommended to be taken after CS224N, and its merit is that it provides exercises in Pytorch.
Code-First Intro to Natural Language ProcessingA code-first NLP lecture by Rachel Thomas, the co-founder of fast.ai. The motivation that Rachel Thomas gives is mind blowing.
Natural Language Processing with PyTorchAn NLP book from O'REILLY, known for numerous data science books of great quality. It is PyTorch-friendly as all the codes are written in PyTorch.
Linguistic Fundamentals for Natural Language ProcessingA Linguistics book written by the linguist Emily Bender, known for Bender rule. Although not Deep Learning related, it is a great beginner's book on linguistic domain knowledge.

SourceDescription
NumPyStanford's lecture CS231N deals with NumPy, which is fundamental in machine learning calculations.
TensorflowA tutorial provided by Tensorflow. It gives great explanations on the basics with visual aids.
PyTorchAn awesome tutorial on Pytorch provided by Facebook with great quality.
tensor2tensorSequence to Sequence tool kit by Google written in Tensorflow.
fairseqSequence to Sequence tool kit by Facebook written in Pytorch.
Hugging Face TransformersA library based on Transformer provided by Hugging Face that allows easy access to pre-trained models. One of the key NLP libraries to not only developers but researchers as well.
Hugging Face TokenizersA tokenizer library that Hugging Face maintains. It boosts fast operations as the key functions are written in Rust. The latest tokenizers such as BPE can be tried out with Hugging Face tokenizers.
spaCyA tutorial written by Ines, the core developer of the noteworthy spaCy.
torchtextA tutorial on torchtext, a package that makes data preprocessing handy. Has more details than the official documentation.
SentencePieceGoogle's open source library that builds BPE-based vocabulary using subword information.

Useful materials


AWESOME blogs

BlogArticle you should read
Christopher Olah's BlogUnderstanding LSTM Networks
Jay Alammar's BlogIllustrated Word2vec
Sebastian Ruder's BlogTracking Progress in Natural Language Processing
Chris McCormick's BlogWord2Vec Tutorial - The Skip-Gram Model
The GradientEvaluation Metrics for Language Modeling
Distill.pubVisualizing memorization in RNNs
Thomas Wolf's BlogThe Current Best of Universal Word Embeddings and Sentence Embeddings
dair.aiA Light Introduction to Transfer Learning for NLP
Machine Learning MasteryHow to Develop a Neural Machine Translation System from Scratch

NLP Specialists You should remember

(not enumarted by rank)

NameDescriptionKnown for
Kyunghyun ChoProfessor @NYUGRU
Yejin ChoiProfessor @Washington Univ.Grover
Yoon KimPh.D Candidate @Harvard Univ.CNN for NLP
Minjoon SeoResearcher @Clova AI, Allen AIBiDAF
Kyubyong ParkResearcher @Kakao BrainPaper implementation & NLP with Korean language
Tomas MikolovResearcher @FAIRWord2vec
Omer LevyResearcher @FAIRVarious Word Embedding techniques
Jason WestonResearcher @FAIRMemory Networks
Yinhan LiuResearcher @FAIRRoBERTa
Guillaume LampleResearcher @FAIRXLM
Alexis ConneauResearcher @FAIRXLM-R
Mike LewisResearcher @FAIRBART
Ashish VaswaniResearcher @GoogleTransformer
Jacob DevlinResearcher @GoogleBERT
Kenton LeeResearcher @GoogleE2E Coref
Matthew PetersResearcher @Allen AIELMo
Alec RadfordResearcher @Open AIGPT-2
Sebastian RuderResearcher @DeepMindNLP Progress
Richard SocherResearcher @SalesforceGlove
Jeremy HowardCo-founder @Fast.aiULMFiT
Thomas WolfLead Engineer @Hugging facepytorch-transformers
Luke ZettlemoyerProfessor @Washington Univ.ELMo
Yoav GoldbergProfessor @Bar Ilan Univ.Neural Net Methods for NLP
Chris ManningProfessor @Stanford Univ.CS224N
Dan JurafskyProfessor @Stanford Univ.Speech and Language Processing
Graham NeubigProfessor @CMUNeural Nets for NLP
Sam BowmanProfessor @NYUNLI Benchmark
Nikita KitaevPh.D Candidate @UC BerkeleyReformer
Zihang DaiPh.D Candidate @CMUTransformer-XL
Zhilin YangPh.D Candidate @CMUXLNet
Abigail SeePh.D Candidate @Stanford Univ.Pointer Generator
Kevin ClarkPh.D Candidate @Stanford Univ.ELECTRA
Eric WallacePh.D Candidate @Berkely Univ.AllenNLP Interpret

Research Conferences