Tinker Tutorials

April 11, 2026 · View on GitHub

A guided introduction to Tinker, from your first API call to building custom RL training pipelines.

These tutorials are marimo notebooks — reactive Python notebooks stored as .py files.

Prerequisites

uv pip install tinker tinker-cookbook marimo
export TINKER_API_KEY="your-api-key-here"

git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook
marimo edit tutorials/101_hello_tinker.py

This opens the notebook in your browser with an interactive editor. Rendered versions are also available on the Tinker docs site.

Alternatively, you can try notebooks online in molab, using the links below.

#	Notebook	What you'll learn
101	Hello Tinker	Architecture overview, client hierarchy, sampling from a model
102	Your First SFT	Renderers, datum construction, training loop
103	Async Patterns	Concurrent futures, `num_samples`, batch evaluation throughput
104	First RL	GRPO on GSM8K: reward functions, group-relative advantages

#	Notebook	What you'll learn
201	Rendering	Renderers, tokenization, vision inputs, TrainOnWhat
202	Loss Functions	cross_entropy, IS, PPO, CISPO, custom loss
203	Completers	TokenCompleter vs MessageCompleter, LLM-as-judge
204	Weights	Checkpoint lifecycle, save/load/download/TTL
205	Evaluations	Custom evaluators, NLL, Inspect AI

#	Notebook	What you'll learn
301	Cookbook Abstractions	`Env`, `EnvGroupBuilder`, `RLDataset`, `ProblemEnv`
302	Custom Environment	Build your own `ProblemEnv` subclass and `RLDataset`
303	SFT with Config	`train.Config`, `ChatDatasetBuilder`, `train.main()`
304	RL with Config	`RLDatasetBuilder`, RL training pipeline

#	Notebook	What you'll learn
401	SL Hyperparameters	LR scaling, rank selection, sweeps
402	RL Hyperparameters	KL penalty, group size, advantages
403	DPO & Preferences	Comparison, DPO loss, PreferenceModel
404	Sequence Extension	Multi-turn RL, conversation masks
405	Multi-Agent RL	MessageEnv, self-play, group rewards
406	Prompt Distillation	Teacher/student, context distillation
407	RLHF Pipeline	3-stage SFT, preference model, RL

#	Notebook	What you'll learn
501	Export to HF	Merge LoRA into full model
502	Build LoRA Adapter	PEFT format for vLLM/SGLang
503	Publish to Hub	Upload to HuggingFace with model card

Work through them in order — each builds on concepts from the previous one.

Production recipes with logging, checkpointing, and evaluation: see tinker_cookbook/recipes/
Full documentation: see the Tinker docs site
API reference: see the Tinker API reference