Index: random vector baseline (no text or graph)

May 9, 2026 · View on GitHub

RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

🧩 Overview

RouteProfile is a general framework for designing LLM profiles for routing. It formulates LLM profiling as a structured information integration problem over heterogeneous interaction histories, enabling more principled and effective routing across queries, domains, and models.

Highlights:

General profile design space: Define LLM profiles along four dimensions: organizational form, representation type, aggregation depth, and learning configuration.
Comprehensive evaluation: Evaluate LLM profiles across three representative routers under both standard routing and new-LLM routing settings.

🚀 Get Started

Pipeline Overview

Step 1: Data Collection   →  profile_data/                        (manual / provided)
Step 2: Build Data Graph  →  results/result_data_graph/{mode}/
Step 3: Build Profile     →  results/model_profile_result/{mode}/
Step 4: Route & Evaluate  →  results/routing_result/{mode}/

Two routing settings:

Mode	Description
`standard`	Standard routing with a known set of candidate LLMs
`newllm`	Generalisation to newly introduced, unseen LLMs

Installation

pip install routeprofile

For Text-GNN profiles (requires vLLM):

pip install "routeprofile[text-gnn]"

Install from source (editable):

git clone https://github.com/your-org/RouteProfile.git
cd RouteProfile
pip install -e .

Profiling Methods

Method	File	Org. form	Repr. type	Agg. depth	Learning
`flat`	`flat.npz`	Flat	Text	0	Training-free
`index`	`index.npz`	Flat	Embedding	0	Training-free
`emb_gnn`	`emb_gnn.npz`	Structured	Embedding	Multi-hop	Training-free
`text_gnn`	`text_gnn.npz`	Structured	Text	Multi-hop	Training-free
`trainable`	`trainable_gnn.npz`	Structured	Embedding	Multi-hop	Trainable

🧪 Python Usage

All functions are importable directly from routeprofile:

import routeprofile
print(routeprofile.__version__)  # "0.1.0"

Step 2 — Build Data Graphs

from routeprofile import (
    build_task_graph,
    build_query_graph,
    build_query_task_graph,
    build_task_domain_graph,
    build_query_task_domain_graph,
)

# Uses default profile_data/ inputs; outputs to results/result_data_graph/standard/
build_task_graph(mode="standard")

# Override any input/output path
build_query_task_domain_graph(
    mode="standard",
    json="profile_data/model_feature_standard.json",
    arch="profile_data/model_family_feature.json",
    dataset="profile_data/task_feature.json",
    query="profile_data/task_queries_standard.json",
    domain_map="profile_data/domain_task_map.json",
    domain_feat="profile_data/domain_feature.json",
    save="results/result_data_graph/standard/query_task_domain_graph_full.pt",
)

Step 3a — Training-Free Profiles

from routeprofile import (
    build_flat_profile,
    build_emb_gnn_profile,
    build_index_profile,
    build_text_gnn_profile,
)

# Flat: Longformer encoding of model text + sampled neighbours
build_flat_profile(mode="standard")
# → results/model_profile_result/standard/flat.npz

# Index: random vector baseline (no text or graph)
build_index_profile(mode="standard")
# → results/model_profile_result/standard/index.npz

# Emb-GNN: K-hop neighbourhood propagation (training-free)
build_emb_gnn_profile(
    mode="standard",
    graph="results/result_data_graph/standard/task_graph_full.pt",
    K=2,
    norm="sym",   # "sym" | "rw" | "none"
    save="results/model_profile_result/standard/emb_gnn.npz",
)

# Text-GNN: LLM-based text aggregation per hop (requires vLLM)
build_text_gnn_profile(
    mode="standard",
    graph="results/result_data_graph/standard/query_task_domain_graph_full.pt",
    K=1,
    model="Qwen/Qwen2.5-7B-Instruct",
    tp=1,                        # tensor parallel size (number of GPUs)
    gpu_memory_utilization=0.6,  # fraction of GPU memory for vLLM
    keep=[],                     # [] = save all models; None = TARGET_MODELS only
    emb_save="results/model_profile_result/standard/text_gnn.npz",
)

Step 3b — Trainable GNN Profile (HANConv)

from routeprofile import build_trainable_gnn_profile

build_trainable_gnn_profile(
    mode="standard",
    graph="results/result_data_graph/standard/task_graph_full.pt",
    hidden_dim=256,
    out_dim=128,
    epochs=100,
    save_emb="results/model_profile_result/standard/trainable_gnn.npz",
    save_ckpt="results/trained_trainable_gnn/standard/pretrain_ckpt.pt",
)

Step 4 — Routing Evaluation

from routeprofile import call_simrouter, call_mlprouter, call_graphrouter

# SimRouter: training-free cosine similarity routing
call_simrouter(
    model_profile_path="results/model_profile_result/standard/flat.npz",
    routing_data_path="route_data/routing_test_data.json",
    output_path="results/routing_result/standard/SimRouter_results.json",
)

# MLPRouter: pairwise-ranking MLP
call_mlprouter(
    model_profile_path="results/model_profile_result/standard/emb_gnn.npz",
    training_data_path="route_data/pairwise_training_data_standard.json",
    testing_data_path="route_data/routing_test_data.json",
    output_path="results/routing_result/standard/MLPRouter_results.json",
    save_ckpt="results/trained_MLPRouter/standard/mlp_router_ckpt.pt",
    epochs=50,
)

# GraphRouter: bipartite GAT
call_graphrouter(
    model_profile_path="results/model_profile_result/standard/trainable_gnn.npz",
    training_data_path="route_data/pairwise_training_data_standard.json",
    testing_data_path="route_data/routing_test_data.json",
    output_path="results/routing_result/standard/GraphRouter_results.json",
    save_ckpt="results/trained_GraphRouter/standard/graphrouter_ckpt.pt",
    epochs=50,
)

You can also import the router classes directly:

from routeprofile import SimRouter, MLPRouter, GraphRouter

🧭 CLI Usage

After installation every step is available as a command-line tool:

# Step 2: Build graphs (outputs to results/result_data_graph/{mode}/)
routeprofile-build-task-graph               --mode standard
routeprofile-build-query-graph              --mode standard
routeprofile-build-query-task-graph         --mode standard
routeprofile-build-task-domain-graph        --mode standard
routeprofile-build-query-task-domain-graph  --mode standard

# Step 3a: Training-free profiles (outputs to results/model_profile_result/{mode}/)
routeprofile-flat-profile      --mode standard
routeprofile-index-profile     --mode standard
routeprofile-emb-gnn-profile   --mode standard --K 2
routeprofile-trainable-gnn-profile --mode standard --epochs 100

# Step 4: Routing (outputs to results/routing_result/{mode}/)
routeprofile-sim-router \
    --model-profile-path results/model_profile_result/standard/flat.npz \
    --routing-data-path  route_data/routing_test_data.json

routeprofile-mlp-router \
    --model-profile-path  results/model_profile_result/standard/emb_gnn.npz \
    --training-data-path  route_data/pairwise_training_data_standard.json \
    --testing-data-path   route_data/routing_test_data.json \
    --save-ckpt           results/trained_MLPRouter/standard/mlp_router_ckpt.pt

routeprofile-graph-router \
    --model-profile-path  results/model_profile_result/standard/trainable_gnn.npz \
    --training-data-path  route_data/pairwise_training_data_standard.json \
    --testing-data-path   route_data/routing_test_data.json \
    --save-ckpt           results/trained_GraphRouter/standard/graphrouter_ckpt.pt

All commands accept --help for full usage.

🔧 Shell Scripts Usage

# Build all graphs (standard mode)
bash routeprofile/scripts/step2_build_data_graph.sh standard

# All training-free profiles
bash routeprofile/scripts/step3a_training_free_profile.sh standard all

# Text-GNN (requires vLLM + GPU)
bash routeprofile/scripts/step3a_training_free_profile.sh standard text_gnn

# Trainable GNN
bash routeprofile/scripts/step3b_trainable_profile.sh standard

# Routing evaluation
bash routeprofile/scripts/step4_routing_evaluation.sh standard sim flat.npz
bash routeprofile/scripts/step4_routing_evaluation.sh standard all flat.npz

📊 Extra Information

Directory Structure

RouteProfile/
├── profile_data/                        # Input data (read-only)
│   ├── model_feature_standard.json          # Model metadata (standard routing)
│   ├── model_feature_newllm.json            # Model metadata (newllm routing)
│   ├── model_family_feature.json            # Architecture family descriptions
│   ├── task_queries_standard.json           # Queries per benchmark (standard)
│   ├── task_queries_newllm.json             # Queries per benchmark (newllm)
│   ├── task_feature.json                    # Benchmark task descriptions
│   ├── domain_feature.json                  # Task domain descriptions
│   ├── domain_task_map.json                 # Domain → benchmark mapping
│   └── candidate_models.json               # Candidate LLM metadata
│
├── route_data/                          # Pre-computed routing data
│   ├── routing_test_data.json               # Test queries with model responses
│   ├── pairwise_training_data_standard.json # Pairwise training data (standard)
│   └── pairwise_training_data_newllm.json   # Pairwise training data (newllm)
│
├── routeprofile/                        # Library source
│   ├── build_data_graph/                    # Step 2: graph construction
│   ├── get_model_profile/
│   │   ├── training_free/                   # flat, index, emb_gnn, text_gnn
│   │   └── trainable/                       # HANConv self-supervised
│   ├── routing_evaluation/                  # SimRouter, MLPRouter, GraphRouter
│   └── scripts/                             # Shell scripts for batch runs
│
├── results/                             # All generated outputs (git ignored)
│   ├── result_data_graph/{standard,newllm}/     # Built graphs (.pt)
│   ├── model_profile_result/{standard,newllm}/  # Model profiles (.npz)
│   ├── routing_result/{standard,newllm}/        # Routing evaluation results (.json)
│   ├── trained_trainable_gnn/{standard,newllm}/ # HANConv checkpoints
│   ├── trained_MLPRouter/{standard,newllm}/     # MLP router checkpoints
│   └── trained_GraphRouter/{standard,newllm}/   # Graph router checkpoints
│
├── tests/                               # pytest test suite
└── pyproject.toml

Data Formats

`profile_data/model_feature_{standard|newllm}.json`

Main model metadata. Primary input to all graph builders.

{
  "model-name": {
    "size": "7B",
    "feature": "Natural language description of the model...",
    "architecture": "Qwen2ForCausalLM",
    "detailed_scores": {
      "ifeval": 75.85, "bbh": 53.94, "math": 50.0,
      "gpqa": 29.11, "musr": 40.2, "mmlu_pro": 42.87
    },
    "parameters": 7.616,
    "input_price": 0.2,
    "output_price": 0.2,
    "model": "qwen/qwen2.5-7b-instruct",
    "service": "NVIDIA",
    "api_endpoint": "https://integrate.api.nvidia.com/v1",
    "average_score": 35.2
  }
}

`profile_data/model_family_feature.json`

Architecture family descriptions used as architecture node features.

{
  "Qwen2ForCausalLM": "A family of decoder-only Transformer-based large language models developed by Alibaba Cloud...",
  "LlamaForCausalLM": "A family of autoregressive large language models developed by Meta AI..."
}

`profile_data/task_feature.json`

Natural language description of each benchmark task.

{
  "ifeval": "IFEval (Instruction-Following Evaluation) is a benchmark designed to evaluate the ability of large language models to follow explicit natural language instructions...",
  "bbh":    "BBH (BIG-Bench Hard) is a challenging subset of the BIG-Bench benchmark..."
}

`profile_data/domain_task_map.json`

Maps broad task domains to specific benchmarks.

{
  "knowledge": ["mmlu", "mmlu_pro", "C-Eval", "AGIEval English", "SQuAD", "gpqa"],
  "reasoning": ["bbh", "TheoremQA", "WinoGrande"],
  "math":      ["math", "gsm8k", "TheoremQA"],
  "coding":    ["human_eval", "mbpp"]
}

`profile_data/domain_feature.json`

Natural language description of each task domain.

{
  "knowledge": "Knowledge tasks test factual recall and information retrieval...",
  "reasoning": "Reasoning tasks require multi-step logical inference...",
  "math":      "Math tasks evaluate quantitative and symbolic problem solving..."
}

`profile_data/candidate_models.json`

Candidate model metadata including API endpoints and aggregate scores.

{
  "qwen2.5-7b-instruct": {
    "size": "7B",
    "feature": "Qwen2.5-7B-Instruct represents an upgraded version...",
    "input_price": 0.2,
    "output_price": 0.2,
    "model": "qwen/qwen2.5-7b-instruct",
    "service": "NVIDIA",
    "api_endpoint": "https://integrate.api.nvidia.com/v1",
    "average_score": 35.2,
    "detailed_scores": { "ifeval": 75.85, "bbh": 53.94 },
    "parameters": 7.616,
    "architecture": "Qwen2ForCausalLM"
  }
}

`profile_data/task_queries_{standard|newllm}.json`

Per-benchmark query lists used to build query nodes.

{
  "ifeval": ["Instruction 1...", "Instruction 2...", ...],
  "bbh":    ["Question 1...",   "Question 2...",   ...]
}

`route_data/routing_test_data.json`

Pre-computed model responses for test queries.

[
  {
    "task_name": "ifeval",
    "query": "Follow these instructions...",
    "ground_truth": "A",
    "metric": "em_mc",
    "choices": "{'text': ['A', 'B', 'C', 'D'], 'labels': ['A', 'B', 'C', 'D']}",
    "model_performance": {
      "qwen2.5-7b-instruct": { "response": "A", "task_performance": 1.0, "success": true }
    }
  }
]

`route_data/pairwise_training_data_{standard|newllm}.json`

Pairwise training data for MLPRouter and GraphRouter. Each entry records which model outperforms which on a given query.

{
  "task_data_count": {
    "agentverse-logicgrid": 1352,
    "gsm8k": 741
  },
  "pairwise_data": [
    {
      "task_name": "agentverse-logicgrid",
      "query": "Q: There are 4 houses...",
      "ground_truth": "B",
      "metric": "em_mc",
      "choices": "{'text': ['1', '2', '3', '4'], 'labels': ['A', 'B', 'C', 'D']}",
      "task_id": null,
      "better_model": "mistral-small-24b-instruct-2501-bf16",
      "worse_model":  "mixtral-8x22b-instruct-v0.1"
    }
  ]
}

Note: Use pairwise_training_data_{mode}.json as training_data_path for MLPRouter and GraphRouter. The routing_test_data.json is used for testing_data_path.

Candidate Models

The default set of 8 candidate models:

Model	Size	Architecture
`qwen2.5-7b-instruct`	7B	Qwen2ForCausalLM
`gemma-2-9b-it`	9B	Gemma2ForCausalLM
`llama-3.1-8b-instruct`	8B	LlamaForCausalLM
`mixtral-8x7b-instruct-v0.1`	46.7B	MixtralForCausalLM
`mixtral-8x22b-instruct-v0.1`	141B	MixtralForCausalLM
`llama-3.2-3b-instruct`	3B	LlamaForCausalLM
`mistral-small-24b-instruct-2501-bf16`	24B	MistralForCausalLM
`llama-3.3-70b-instruct`	70B	LlamaForCausalLM

Router Methods

Router	Type	Description
`SimRouter`	Training-free	Cosine similarity between query and model embeddings
`MLPRouter`	Trainable	Pairwise ranking loss; query + model encoders
`GraphRouter`	Trainable	Bipartite GAT with edge prediction (BCE loss)

📚 Citation

If you use RouteProfile in your research, please cite:

@article{routeprofile2025,
  title     = {RouteProfile: Elucidating the Design Space of LLM Profiles for Routing},
  author    = {Xu, Jingjun and Pu, Hongji and Feng, Tao and Zhang, Haozhen and You, Jiaxuan and Liu, Ge},
  journal   = {arXiv preprint arXiv:2605.00180},
  year      = {2026},
  url       = {https://arxiv.org/abs/2605.00180}
}