How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

February 5, 2026 · View on GitHub

Authors: Parth Asawa*, Alan Zhu*, Abby O'Neill, Matei Zaharia, Alexandros G. Dimakis, Joseph E. Gonzalez

*Equal contribution.

📜 Paper: https://arxiv.org/pdf/2510.02453

Setup

Run uv sync to install local development dependencies. Activate your virtual environment with source .venv/bin/activate.

To setup the (separate) training virtual environment for all example environments, run the following commands:

cd SkyRL/skyrl-train
uv sync --extra vllm
source .venv/bin/activate

Training script examples are provided in the advisor_models directory, along with templates for new environments. You will also need to have specified an OPENAI_API_KEY and WANDB_API_KEY in your environment.

Advisor Models

Overview

Customizing powerful, black-box models is a major challenge, with most practitioners typically limited to static prompting.

We propose a framework to train a small open-source “advisor” model to guide black-box models via feedback, optimizing the model to your specific environment, task, or users with RL.

We show that Advisor Models are highly effective for personalizing and adapting black-box models to specific environments. We additionally test the system in reasoning-intensive tasks, finding properties of the framework under which Advisor Models work best, and demonstrating the system robustness across models and environments.

@article{asawa2026trainadvisorsteeringblackbox,
  title={How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models},
  author={Parth Asawa and Alan Zhu and Abby O'Neill and Matei Zaharia and Alexandros G. Dimakis and Joseph E. Gonzalez},
  year={2026},
  journal={arXiv preprint arXiv:2510.02453},
}

How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

Setup

Advisor Models

Overview

Example

📜 License

📧 Contact

📋 Citation