Tinker Tutorials
April 11, 2026 · View on GitHub
A guided introduction to Tinker, from your first API call to building custom RL training pipelines.
These tutorials are marimo notebooks — reactive Python notebooks stored as .py files.
Prerequisites
- Python 3.10+
- A Tinker API key (get one here)
Setup
uv pip install tinker tinker-cookbook marimo
export TINKER_API_KEY="your-api-key-here"
Running a tutorial
git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook
marimo edit tutorials/101_hello_tinker.py
This opens the notebook in your browser with an interactive editor. Rendered versions are also available on the Tinker docs site.
Alternatively, you can try notebooks online in molab, using the links below.
Tutorials
Basics (1xx)
| # | Notebook | What you'll learn | Try on molab |
|---|---|---|---|
| 101 | Hello Tinker | Architecture overview, client hierarchy, sampling from a model | |
| 102 | Your First SFT | Renderers, datum construction, training loop | |
| 103 | Async Patterns | Concurrent futures, num_samples, batch evaluation throughput | |
| 104 | First RL | GRPO on GSM8K: reward functions, group-relative advantages |
Core Concepts (2xx)
| # | Notebook | What you'll learn | Try on molab |
|---|---|---|---|
| 201 | Rendering | Renderers, tokenization, vision inputs, TrainOnWhat | |
| 202 | Loss Functions | cross_entropy, IS, PPO, CISPO, custom loss | |
| 203 | Completers | TokenCompleter vs MessageCompleter, LLM-as-judge | |
| 204 | Weights | Checkpoint lifecycle, save/load/download/TTL | |
| 205 | Evaluations | Custom evaluators, NLL, Inspect AI |
Cookbook Abstractions (3xx)
| # | Notebook | What you'll learn | Try on molab |
|---|---|---|---|
| 301 | Cookbook Abstractions | Env, EnvGroupBuilder, RLDataset, ProblemEnv | |
| 302 | Custom Environment | Build your own ProblemEnv subclass and RLDataset | |
| 303 | SFT with Config | train.Config, ChatDatasetBuilder, train.main() | |
| 304 | RL with Config | RLDatasetBuilder, RL training pipeline |
Advanced (4xx)
| # | Notebook | What you'll learn | Try on molab |
|---|---|---|---|
| 401 | SL Hyperparameters | LR scaling, rank selection, sweeps | |
| 402 | RL Hyperparameters | KL penalty, group size, advantages | |
| 403 | DPO & Preferences | Comparison, DPO loss, PreferenceModel | |
| 404 | Sequence Extension | Multi-turn RL, conversation masks | |
| 405 | Multi-Agent RL | MessageEnv, self-play, group rewards | |
| 406 | Prompt Distillation | Teacher/student, context distillation | |
| 407 | RLHF Pipeline | 3-stage SFT, preference model, RL |
Deployment (5xx)
| # | Notebook | What you'll learn | Try on molab |
|---|---|---|---|
| 501 | Export to HF | Merge LoRA into full model | |
| 502 | Build LoRA Adapter | PEFT format for vLLM/SGLang | |
| 503 | Publish to Hub | Upload to HuggingFace with model card |
Work through them in order — each builds on concepts from the previous one.
After the tutorials
- Production recipes with logging, checkpointing, and evaluation: see
tinker_cookbook/recipes/ - Full documentation: see the Tinker docs site
- API reference: see the Tinker API reference