Tinker Tutorials

April 11, 2026 · View on GitHub

A guided introduction to Tinker, from your first API call to building custom RL training pipelines.

These tutorials are marimo notebooks — reactive Python notebooks stored as .py files.

Prerequisites

Setup

uv pip install tinker tinker-cookbook marimo
export TINKER_API_KEY="your-api-key-here"

Running a tutorial

git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook
marimo edit tutorials/101_hello_tinker.py

This opens the notebook in your browser with an interactive editor. Rendered versions are also available on the Tinker docs site.

Alternatively, you can try notebooks online in molab, using the links below.

Tutorials

Basics (1xx)

#NotebookWhat you'll learnTry on molab
101Hello TinkerArchitecture overview, client hierarchy, sampling from a modelOpen in molab
102Your First SFTRenderers, datum construction, training loopOpen in molab
103Async PatternsConcurrent futures, num_samples, batch evaluation throughputOpen in molab
104First RLGRPO on GSM8K: reward functions, group-relative advantagesOpen in molab

Core Concepts (2xx)

#NotebookWhat you'll learnTry on molab
201RenderingRenderers, tokenization, vision inputs, TrainOnWhatOpen in molab
202Loss Functionscross_entropy, IS, PPO, CISPO, custom lossOpen in molab
203CompletersTokenCompleter vs MessageCompleter, LLM-as-judgeOpen in molab
204WeightsCheckpoint lifecycle, save/load/download/TTLOpen in molab
205EvaluationsCustom evaluators, NLL, Inspect AIOpen in molab

Cookbook Abstractions (3xx)

#NotebookWhat you'll learnTry on molab
301Cookbook AbstractionsEnv, EnvGroupBuilder, RLDataset, ProblemEnvOpen in molab
302Custom EnvironmentBuild your own ProblemEnv subclass and RLDatasetOpen in molab
303SFT with Configtrain.Config, ChatDatasetBuilder, train.main()Open in molab
304RL with ConfigRLDatasetBuilder, RL training pipelineOpen in molab

Advanced (4xx)

#NotebookWhat you'll learnTry on molab
401SL HyperparametersLR scaling, rank selection, sweepsOpen in molab
402RL HyperparametersKL penalty, group size, advantagesOpen in molab
403DPO & PreferencesComparison, DPO loss, PreferenceModelOpen in molab
404Sequence ExtensionMulti-turn RL, conversation masksOpen in molab
405Multi-Agent RLMessageEnv, self-play, group rewardsOpen in molab
406Prompt DistillationTeacher/student, context distillationOpen in molab
407RLHF Pipeline3-stage SFT, preference model, RLOpen in molab

Deployment (5xx)

#NotebookWhat you'll learnTry on molab
501Export to HFMerge LoRA into full modelOpen in molab
502Build LoRA AdapterPEFT format for vLLM/SGLangOpen in molab
503Publish to HubUpload to HuggingFace with model cardOpen in molab

Work through them in order — each builds on concepts from the previous one.

After the tutorials