Index: random vector baseline (no text or graph)
May 9, 2026 ยท View on GitHub
RouteProfile: Elucidating the Design Space of LLM Profiles for Routing
๐งฉ Overview
RouteProfile is a general framework for designing LLM profiles for routing. It formulates LLM profiling as a structured information integration problem over heterogeneous interaction histories, enabling more principled and effective routing across queries, domains, and models.
Highlights:
- General profile design space: Define LLM profiles along four dimensions: organizational form, representation type, aggregation depth, and learning configuration.
- Comprehensive evaluation: Evaluate LLM profiles across three representative routers under both standard routing and new-LLM routing settings.
๐ Links
๐ Get Started
Pipeline Overview
Step 1: Data Collection โ profile_data/ (manual / provided)
Step 2: Build Data Graph โ results/result_data_graph/{mode}/
Step 3: Build Profile โ results/model_profile_result/{mode}/
Step 4: Route & Evaluate โ results/routing_result/{mode}/
Two routing settings:
| Mode | Description |
|---|---|
standard | Standard routing with a known set of candidate LLMs |
newllm | Generalisation to newly introduced, unseen LLMs |
Installation
pip install routeprofile
For Text-GNN profiles (requires vLLM):
pip install "routeprofile[text-gnn]"
Install from source (editable):
git clone https://github.com/your-org/RouteProfile.git
cd RouteProfile
pip install -e .
Profiling Methods
| Method | File | Org. form | Repr. type | Agg. depth | Learning |
|---|---|---|---|---|---|
flat | flat.npz | Flat | Text | 0 | Training-free |
index | index.npz | Flat | Embedding | 0 | Training-free |
emb_gnn | emb_gnn.npz | Structured | Embedding | Multi-hop | Training-free |
text_gnn | text_gnn.npz | Structured | Text | Multi-hop | Training-free |
trainable | trainable_gnn.npz | Structured | Embedding | Multi-hop | Trainable |
๐งช Python Usage
All functions are importable directly from routeprofile:
import routeprofile
print(routeprofile.__version__) # "0.1.0"
Step 2 โ Build Data Graphs
from routeprofile import (
build_task_graph,
build_query_graph,
build_query_task_graph,
build_task_domain_graph,
build_query_task_domain_graph,
)
# Uses default profile_data/ inputs; outputs to results/result_data_graph/standard/
build_task_graph(mode="standard")
# Override any input/output path
build_query_task_domain_graph(
mode="standard",
json="profile_data/model_feature_standard.json",
arch="profile_data/model_family_feature.json",
dataset="profile_data/task_feature.json",
query="profile_data/task_queries_standard.json",
domain_map="profile_data/domain_task_map.json",
domain_feat="profile_data/domain_feature.json",
save="results/result_data_graph/standard/query_task_domain_graph_full.pt",
)
Step 3a โ Training-Free Profiles
from routeprofile import (
build_flat_profile,
build_emb_gnn_profile,
build_index_profile,
build_text_gnn_profile,
)
# Flat: Longformer encoding of model text + sampled neighbours
build_flat_profile(mode="standard")
# โ results/model_profile_result/standard/flat.npz
# Index: random vector baseline (no text or graph)
build_index_profile(mode="standard")
# โ results/model_profile_result/standard/index.npz
# Emb-GNN: K-hop neighbourhood propagation (training-free)
build_emb_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/task_graph_full.pt",
K=2,
norm="sym", # "sym" | "rw" | "none"
save="results/model_profile_result/standard/emb_gnn.npz",
)
# Text-GNN: LLM-based text aggregation per hop (requires vLLM)
build_text_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/query_task_domain_graph_full.pt",
K=1,
model="Qwen/Qwen2.5-7B-Instruct",
tp=1, # tensor parallel size (number of GPUs)
gpu_memory_utilization=0.6, # fraction of GPU memory for vLLM
keep=[], # [] = save all models; None = TARGET_MODELS only
emb_save="results/model_profile_result/standard/text_gnn.npz",
)
Step 3b โ Trainable GNN Profile (HANConv)
from routeprofile import build_trainable_gnn_profile
build_trainable_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/task_graph_full.pt",
hidden_dim=256,
out_dim=128,
epochs=100,
save_emb="results/model_profile_result/standard/trainable_gnn.npz",
save_ckpt="results/trained_trainable_gnn/standard/pretrain_ckpt.pt",
)
Step 4 โ Routing Evaluation
from routeprofile import call_simrouter, call_mlprouter, call_graphrouter
# SimRouter: training-free cosine similarity routing
call_simrouter(
model_profile_path="results/model_profile_result/standard/flat.npz",
routing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/SimRouter_results.json",
)
# MLPRouter: pairwise-ranking MLP
call_mlprouter(
model_profile_path="results/model_profile_result/standard/emb_gnn.npz",
training_data_path="route_data/pairwise_training_data_standard.json",
testing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/MLPRouter_results.json",
save_ckpt="results/trained_MLPRouter/standard/mlp_router_ckpt.pt",
epochs=50,
)
# GraphRouter: bipartite GAT
call_graphrouter(
model_profile_path="results/model_profile_result/standard/trainable_gnn.npz",
training_data_path="route_data/pairwise_training_data_standard.json",
testing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/GraphRouter_results.json",
save_ckpt="results/trained_GraphRouter/standard/graphrouter_ckpt.pt",
epochs=50,
)
You can also import the router classes directly:
from routeprofile import SimRouter, MLPRouter, GraphRouter
๐งญ CLI Usage
After installation every step is available as a command-line tool:
# Step 2: Build graphs (outputs to results/result_data_graph/{mode}/)
routeprofile-build-task-graph --mode standard
routeprofile-build-query-graph --mode standard
routeprofile-build-query-task-graph --mode standard
routeprofile-build-task-domain-graph --mode standard
routeprofile-build-query-task-domain-graph --mode standard
# Step 3a: Training-free profiles (outputs to results/model_profile_result/{mode}/)
routeprofile-flat-profile --mode standard
routeprofile-index-profile --mode standard
routeprofile-emb-gnn-profile --mode standard --K 2
routeprofile-trainable-gnn-profile --mode standard --epochs 100
# Step 4: Routing (outputs to results/routing_result/{mode}/)
routeprofile-sim-router \
--model-profile-path results/model_profile_result/standard/flat.npz \
--routing-data-path route_data/routing_test_data.json
routeprofile-mlp-router \
--model-profile-path results/model_profile_result/standard/emb_gnn.npz \
--training-data-path route_data/pairwise_training_data_standard.json \
--testing-data-path route_data/routing_test_data.json \
--save-ckpt results/trained_MLPRouter/standard/mlp_router_ckpt.pt
routeprofile-graph-router \
--model-profile-path results/model_profile_result/standard/trainable_gnn.npz \
--training-data-path route_data/pairwise_training_data_standard.json \
--testing-data-path route_data/routing_test_data.json \
--save-ckpt results/trained_GraphRouter/standard/graphrouter_ckpt.pt
All commands accept --help for full usage.
๐ง Shell Scripts Usage
# Build all graphs (standard mode)
bash routeprofile/scripts/step2_build_data_graph.sh standard
# All training-free profiles
bash routeprofile/scripts/step3a_training_free_profile.sh standard all
# Text-GNN (requires vLLM + GPU)
bash routeprofile/scripts/step3a_training_free_profile.sh standard text_gnn
# Trainable GNN
bash routeprofile/scripts/step3b_trainable_profile.sh standard
# Routing evaluation
bash routeprofile/scripts/step4_routing_evaluation.sh standard sim flat.npz
bash routeprofile/scripts/step4_routing_evaluation.sh standard all flat.npz
๐ Extra Information
Directory Structure
RouteProfile/
โโโ profile_data/ # Input data (read-only)
โ โโโ model_feature_standard.json # Model metadata (standard routing)
โ โโโ model_feature_newllm.json # Model metadata (newllm routing)
โ โโโ model_family_feature.json # Architecture family descriptions
โ โโโ task_queries_standard.json # Queries per benchmark (standard)
โ โโโ task_queries_newllm.json # Queries per benchmark (newllm)
โ โโโ task_feature.json # Benchmark task descriptions
โ โโโ domain_feature.json # Task domain descriptions
โ โโโ domain_task_map.json # Domain โ benchmark mapping
โ โโโ candidate_models.json # Candidate LLM metadata
โ
โโโ route_data/ # Pre-computed routing data
โ โโโ routing_test_data.json # Test queries with model responses
โ โโโ pairwise_training_data_standard.json # Pairwise training data (standard)
โ โโโ pairwise_training_data_newllm.json # Pairwise training data (newllm)
โ
โโโ routeprofile/ # Library source
โ โโโ build_data_graph/ # Step 2: graph construction
โ โโโ get_model_profile/
โ โ โโโ training_free/ # flat, index, emb_gnn, text_gnn
โ โ โโโ trainable/ # HANConv self-supervised
โ โโโ routing_evaluation/ # SimRouter, MLPRouter, GraphRouter
โ โโโ scripts/ # Shell scripts for batch runs
โ
โโโ results/ # All generated outputs (git ignored)
โ โโโ result_data_graph/{standard,newllm}/ # Built graphs (.pt)
โ โโโ model_profile_result/{standard,newllm}/ # Model profiles (.npz)
โ โโโ routing_result/{standard,newllm}/ # Routing evaluation results (.json)
โ โโโ trained_trainable_gnn/{standard,newllm}/ # HANConv checkpoints
โ โโโ trained_MLPRouter/{standard,newllm}/ # MLP router checkpoints
โ โโโ trained_GraphRouter/{standard,newllm}/ # Graph router checkpoints
โ
โโโ tests/ # pytest test suite
โโโ pyproject.toml
Data Formats
profile_data/model_feature_{standard|newllm}.json
Main model metadata. Primary input to all graph builders.
{
"model-name": {
"size": "7B",
"feature": "Natural language description of the model...",
"architecture": "Qwen2ForCausalLM",
"detailed_scores": {
"ifeval": 75.85, "bbh": 53.94, "math": 50.0,
"gpqa": 29.11, "musr": 40.2, "mmlu_pro": 42.87
},
"parameters": 7.616,
"input_price": 0.2,
"output_price": 0.2,
"model": "qwen/qwen2.5-7b-instruct",
"service": "NVIDIA",
"api_endpoint": "https://integrate.api.nvidia.com/v1",
"average_score": 35.2
}
}
profile_data/model_family_feature.json
Architecture family descriptions used as architecture node features.
{
"Qwen2ForCausalLM": "A family of decoder-only Transformer-based large language models developed by Alibaba Cloud...",
"LlamaForCausalLM": "A family of autoregressive large language models developed by Meta AI..."
}
profile_data/task_feature.json
Natural language description of each benchmark task.
{
"ifeval": "IFEval (Instruction-Following Evaluation) is a benchmark designed to evaluate the ability of large language models to follow explicit natural language instructions...",
"bbh": "BBH (BIG-Bench Hard) is a challenging subset of the BIG-Bench benchmark..."
}
profile_data/domain_task_map.json
Maps broad task domains to specific benchmarks.
{
"knowledge": ["mmlu", "mmlu_pro", "C-Eval", "AGIEval English", "SQuAD", "gpqa"],
"reasoning": ["bbh", "TheoremQA", "WinoGrande"],
"math": ["math", "gsm8k", "TheoremQA"],
"coding": ["human_eval", "mbpp"]
}
profile_data/domain_feature.json
Natural language description of each task domain.
{
"knowledge": "Knowledge tasks test factual recall and information retrieval...",
"reasoning": "Reasoning tasks require multi-step logical inference...",
"math": "Math tasks evaluate quantitative and symbolic problem solving..."
}
profile_data/candidate_models.json
Candidate model metadata including API endpoints and aggregate scores.
{
"qwen2.5-7b-instruct": {
"size": "7B",
"feature": "Qwen2.5-7B-Instruct represents an upgraded version...",
"input_price": 0.2,
"output_price": 0.2,
"model": "qwen/qwen2.5-7b-instruct",
"service": "NVIDIA",
"api_endpoint": "https://integrate.api.nvidia.com/v1",
"average_score": 35.2,
"detailed_scores": { "ifeval": 75.85, "bbh": 53.94 },
"parameters": 7.616,
"architecture": "Qwen2ForCausalLM"
}
}
profile_data/task_queries_{standard|newllm}.json
Per-benchmark query lists used to build query nodes.
{
"ifeval": ["Instruction 1...", "Instruction 2...", ...],
"bbh": ["Question 1...", "Question 2...", ...]
}
route_data/routing_test_data.json
Pre-computed model responses for test queries.
[
{
"task_name": "ifeval",
"query": "Follow these instructions...",
"ground_truth": "A",
"metric": "em_mc",
"choices": "{'text': ['A', 'B', 'C', 'D'], 'labels': ['A', 'B', 'C', 'D']}",
"model_performance": {
"qwen2.5-7b-instruct": { "response": "A", "task_performance": 1.0, "success": true }
}
}
]
route_data/pairwise_training_data_{standard|newllm}.json
Pairwise training data for MLPRouter and GraphRouter. Each entry records which model outperforms which on a given query.
{
"task_data_count": {
"agentverse-logicgrid": 1352,
"gsm8k": 741
},
"pairwise_data": [
{
"task_name": "agentverse-logicgrid",
"query": "Q: There are 4 houses...",
"ground_truth": "B",
"metric": "em_mc",
"choices": "{'text': ['1', '2', '3', '4'], 'labels': ['A', 'B', 'C', 'D']}",
"task_id": null,
"better_model": "mistral-small-24b-instruct-2501-bf16",
"worse_model": "mixtral-8x22b-instruct-v0.1"
}
]
}
Note: Use
pairwise_training_data_{mode}.jsonastraining_data_pathfor MLPRouter and GraphRouter. Therouting_test_data.jsonis used fortesting_data_path.
Candidate Models
The default set of 8 candidate models:
| Model | Size | Architecture |
|---|---|---|
qwen2.5-7b-instruct | 7B | Qwen2ForCausalLM |
gemma-2-9b-it | 9B | Gemma2ForCausalLM |
llama-3.1-8b-instruct | 8B | LlamaForCausalLM |
mixtral-8x7b-instruct-v0.1 | 46.7B | MixtralForCausalLM |
mixtral-8x22b-instruct-v0.1 | 141B | MixtralForCausalLM |
llama-3.2-3b-instruct | 3B | LlamaForCausalLM |
mistral-small-24b-instruct-2501-bf16 | 24B | MistralForCausalLM |
llama-3.3-70b-instruct | 70B | LlamaForCausalLM |
Router Methods
| Router | Type | Description |
|---|---|---|
SimRouter | Training-free | Cosine similarity between query and model embeddings |
MLPRouter | Trainable | Pairwise ranking loss; query + model encoders |
GraphRouter | Trainable | Bipartite GAT with edge prediction (BCE loss) |
๐ Citation
If you use RouteProfile in your research, please cite:
@article{routeprofile2025,
title = {RouteProfile: Elucidating the Design Space of LLM Profiles for Routing},
author = {Xu, Jingjun and Pu, Hongji and Feng, Tao and Zhang, Haozhen and You, Jiaxuan and Liu, Ge},
journal = {arXiv preprint arXiv:2605.00180},
year = {2026},
url = {https://arxiv.org/abs/2605.00180}
}