README.md
February 20, 2026 路 View on GitHub
Democratizing Agentic Reinforcement Learning as a Service
Paper 路 Project Page 路 DeepWiki 路 Slack 路 Wechat
馃殌 Quick Start
Choose an example below to get started. Each example includes step-by-step instructions for setup, training, and inference.
| Task | Description | Performance |
|---|---|---|
| LLM Single-Turn Math | Mathematical problem solving | wandb |
| LLM Multi-Turn Math | Multi-turn mathematical problem solving with tool calling | wandb |
| LLM Single-LoRA Single-Turn Math | Math single-turn Trained With LoRA | wandb |
| VLM Single-Turn Math | geometry 3k math problem solving | wandb |
| VLM Multi-Turn Math | geometry 3k math problem solving with tool calling | wandb |
| LLM Gomoku Agent | A multi-turn gomoku agent | wandb |
| LLM AlfWorld Agent | A multi-turn alfworld agent | wandb |
| LLM Android World Agent | A multi-turn android world agent |
馃摝 Installation
馃敼 Common Setup (Client and Server)
Clone the Repository
git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinker
Install OpenTinker
pip install -e .
Install verl (core package)
cd verl
pip install -e .
cd ..
馃捇 Client Setup
After completing the Common Setup, no additional steps are needed.
Note The client currently relies on a small subset of functions from
verl. This dependency is transitional. In future releases, the client will be fully decoupled fromverl, allowing it to remain completely lightweight and independent of training-related code.
馃 Server Setup
In addition to the Common Setup, it must install verl dependencies.
You can choose one of the following two approaches.
Option 1: Docker Installation (Recommended)
# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
# Create and run container
docker run -dit \
--gpus all \
--restart=no \
--entrypoint /bin/bash \
--net=host \
--shm-size=10g \
--cap-add=SYS_ADMIN \
-v .:/workspace/dev \
--name tinker \
verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
Option 2: Manual Installation
you can install verl dependencies manually. After completing the Common Setup, run:
cd verl
pip install -r requirements.txt
cd ..
This installs all GPU and training-related dependencies required by the server.
鈿狅笍 Warning Manual installation may introduce version conflicts. For better stability and reproducibility, we recommend using the Docker-based setup whenever possible.
馃攼 Authentication
OpenTinker includes a built-in authentication system to secure access to the scheduler API.
Configuration
Edit opentinker/scheduler/config/scheduler.yaml:
enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"
Quick Registration
Run the interactive script to register a user and get an API key:
python opentinker/scheduler/register_user_example.py
For advanced usage (REST API registration, using the key) and detailed configuration, see the Scheduler & Dashboard Guide.
馃幃 Environments
OpenTinker provides a flexible environment design framework that supports diverse training scenarios. Our architecture accommodates two orthogonal dimensions:
- Data Source: Data-Dependent environments load structured datasets (e.g., parquet files) to provide prompts, while Data-Free environments generate prompts dynamically from simulators or game engines.
- Interaction Mode: Single-Turn environments involve one-shot model responses, while Multi-Turn environments enable iterative interactions with tool calls and feedback loops.
This 2脳2 design space enables four distinct paradigms, each suited to different learning objectives:
| Paradigm | Data Source | Interaction | Example Use Case |
|---|---|---|---|
| Data-Dependent 脳 Single-Turn | Dataset | One-shot | Math reasoning, QA tasks |
| Data-Dependent 脳 Multi-Turn | Dataset | Iterative | Tool-assisted problem solving |
| Data-Free 脳 Single-Turn | Simulator | One-shot | Bandit |
| Data-Free 脳 Multi-Turn | Simulator | Iterative | Complex game playing, dialogue agents |
馃摎 Documentation
- Scheduler & Dashboard Guide - Configuration, Usage, and Web Dashboard
馃摉 Citation
@misc{zhu2026opentinkerseparatingconcernsagentic,
title={OpenTinker: Separating Concerns in Agentic Reinforcement Learning},
author={Siqi Zhu and Jiaxuan You},
year={2026},
eprint={2601.07376},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.07376},
}