Anti-hype LLM reading list

February 7, 2024 · View on GitHub

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)
Transformers as Support Vector Machines
Survey of LLMS
Deep Learning Systems
Fundamental ML Reading List

Building Blocks

What are embeddings
Concepts from Operating Systems that Found their way into LLMS
Talking about Large Language Models
Language Modeling is Compression
Vector Search - Long-Term Memory in AI
Eight things to know about large language models
The Bitter Lesson
The Hardware Lottery
The Scaling Hypothesis
Tokenization
LLM Course

Foundational Deep Learning Papers (in semi-chronological order)

Seq2Seq
Attention is all you Need
BERT
GPT-1
Scaling Laws for Neural Language Models
T5
GPT-2: Language Models are Unsupervised Multi-Task Learners
InstructGPT: Training Language Models to Follow Instructions
GPT-3: Language Models are Few-Shot Learners

The Transformer Architecture

Transformers from Scratch
Transformer Math
Five Years of GPT Progress
Lost in the Middle: How Language Models Use Long Contexts

Attention

Self-attention and transformer networks
Attention
Understanding and Coding the Attention Mechanism
Attention Mechanisms
Keys, Queries, and Values

GPT

What is ChatGPT doing and why does it work
My own notes from a few months back.
Karpathy's The State of GPT (YouTube)
OpenAI Cookbook

Significant OSS Models

Llama2
Mistral7B
- Mixtral
Phi2
Falcon7B

LLMs in 2023

Catching up on the weird world of LLMS
How open are open architectures?
Building an LLM from Scratch
Large Language Models in 2023 and Slides
Timeline of Transformer Models
Large Language Model Evolutionary Tree

Training Data

What's in my Big Data
"The “it” in AI models is the dataset."
Extracting Training Data from ChatGPT

Pre-Training

Why host your own LLM?
How to train your own LLMs
Hugging Face Resources on Training Your Own
Training Compute-Optimal Large Language Models
Opt-175B Logbook

RLHF and DPO

RLHF
- Supervised Fine-tuning
- How Abilities in LLMs Are Affected by SFT
Instruction-tuning for LLMs: Survey
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
RLHF and DPO Compared

Fine-Tuning and Compression

The Complete Guide to LLM Fine-tuning
LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language - Really great overview of SOTA fine-tuning techniques
On the Structural Pruning of Large Language Models
Quantiztion
PEFT

Small and Local LLMs

How is LlamaCPP Possible?
How to beat GPT-4 with a 13-B Model
Efficient LLM Inference on CPUs
Tiny Language Models Come of Age
Efficiency LLM Spectrum
TinyML at MIT

Deployment and Production

Building LLM Applications for Production
Challenges and Applications of Large Language Models
All the Hard Stuff Nobody talks about when building products with LLMs
Scaling Kubernetes to run ChatGPT
Numbers every LLM Developer should know
Against LLM Maximalism
A Guide to Inference and Performance
(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild
The State of Production LLMs in 2023
Machine Learning Engineering for successful training of large language models and multi-modal models.
Fine-tuning RedPajama on Slack Data

LLM Inference and K-V Cache

LLM Inference Performance Engineering: Best Practices
How to Make LLMs go Fast
Transformer Inference Arithmetic
Which serving technology to use for LLMs?
Speeding up the K-V cache
Large Transformer Model Inference Optimization

Prompt Engineering and RAG

On Prompt Engineering
Prompt Engineering Versus Blind Prompting
Building RAG-Based Applications for Production
Full Fine-Tuning, PEFT, or RAG?
Prompt Engineering Guide

GPUs

The Best GPUS for Deep Learning 2023
Making Deep Learning Go Brr from First Principles
Everything about Distributed Training and Efficient Finetuning
Training LLMs at Scale with AMD MI250 GPUs
GPU Programming

Evaluation

Evaluating ChatGPT
ChatGPT: Jack of All Trades, Master of None
What's Going on with the Open LLM Leaderboard
Challenges in Evaluating AI Systems
LLM Evaluation Papers
Evaluating LLMs is a MineField

Eval Frameworks

HELM
LM Eval Harness
LmSys Chatbot Arena

UX

Generative Interfaces Beyond Chat (YouTube)
Why Chatbots are not the Future
The Future of Search is Boutique
As a Large Language Model, I
Natural Language is an Unnatural Interface

What's Next?

Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.