Graph Router (GNN-Based Router)
December 29, 2025 · View on GitHub
Overview
The Graph Router uses Graph Neural Networks (GNNs) to make routing decisions by modeling queries and LLMs as nodes in a heterogeneous graph. It learns routing patterns by propagating information through the graph structure, capturing complex relationships between queries, LLMs, and their performance characteristics.
Paper Reference
This router implements the GraphRouter approach:
-
GraphRouter: A Graph-based Router for LLM Selections
- (2024). arXiv:2410.03834.
- Constructs heterogeneous graph with task, query, and LLM nodes for routing.
-
GNN Foundations: Kipf, T. N., & Welling, M. (2017). "Semi-supervised classification with graph convolutional networks." ICLR.
-
Application: Treats LLM routing as link prediction in a bipartite query-model graph.
How It Works
Graph Structure
Query Nodes ─── edges(performance) ──→ LLM Nodes
GNN Message Passing
↓
Predictions
Node Types:
- Query Nodes: Each query is a node with Longformer embedding features
- LLM Nodes: Each LLM is a node with learned/provided embeddings
- Edges: Connect queries to all LLMs, weighted by performance scores
Routing Mechanism
-
Graph Construction:
- Create bipartite graph: queries on one side, LLMs on the other
- Add edges from each query to all LLMs
- Edge features: performance scores (or 0 for new queries)
-
GNN Forward Pass:
- Aggregate information from neighboring nodes
- Update node representations using message passing
- Apply graph attention or convolution layers
-
Prediction:
- For each query-LLM edge, predict suitability score
- Select LLM with highest predicted score
Training Strategy
Uses edge masking for training:
- Mask a portion of edges (e.g., 30%)
- Train GNN to predict performance on masked edges
- Evaluation on validation set with different masked edges
Configuration Parameters
Training Hyperparameters (hparam in config)
| Parameter | Type | Default | Description |
|---|---|---|---|
hidden_dim | int | 64 | Hidden layer dimension for GNN. Controls model capacity. Range: 32-256. |
learning_rate | float | 0.001 | Learning rate for AdamW optimizer. Range: 0.0001-0.01. |
weight_decay | float | 0.0001 | L2 regularization weight decay. Prevents overfitting. |
train_epoch | int | 100 | Number of training epochs. Increase for larger graphs. |
batch_size | int | 4 | Number of masked samples per gradient step. |
train_mask_rate | float | 0.3 | Fraction of edges to mask during training (0.0-1.0). |
val_split_ratio | float | 0.2 | Ratio of training data used for validation. |
random_state | int | 42 | Random seed for reproducibility. |
Data Paths
| Parameter | Description |
|---|---|
routing_data_train | Training query-LLM performance data (JSONL) |
query_embedding_data | Pre-computed Longformer query embeddings (PyTorch tensor) |
llm_data | LLM information with optional embeddings (JSON) |
Model Paths
| Parameter | Purpose |
|---|---|
save_model_path | Where to save trained GNN model |
load_model_path | Model to load for inference |
CLI Usage
The Graph Router can be used via the llmrouter command-line interface:
Training
# Train the Graph router (GPU recommended)
llmrouter train --router graphrouter --config configs/model_config_train/graphrouter.yaml --device cuda
# Train with quiet mode
llmrouter train --router graphrouter --config configs/model_config_train/graphrouter.yaml --device cuda --quiet
Inference
# Route a single query
llmrouter infer --router graphrouter --config configs/model_config_test/graphrouter.yaml \
--query "Explain quantum mechanics"
# Route queries from a file
llmrouter infer --router graphrouter --config configs/model_config_test/graphrouter.yaml \
--input queries.jsonl --output results.json
# Route only (without calling LLM API)
llmrouter infer --router graphrouter --config configs/model_config_test/graphrouter.yaml \
--query "What is machine learning?" --route-only
Interactive Chat
# Launch chat interface
llmrouter chat --router graphrouter --config configs/model_config_test/graphrouter.yaml
# Launch with custom port
llmrouter chat --router graphrouter --config configs/model_config_test/graphrouter.yaml --port 8080
# Create a public shareable link
llmrouter chat --router graphrouter --config configs/model_config_test/graphrouter.yaml --share
Usage Examples
Training
from llmrouter.models import GraphRouter, GraphRouterTrainer
router = GraphRouter(yaml_path="configs/model_config_train/graphrouter.yaml")
trainer = GraphRouterTrainer(router=router, device="cuda")
trainer.train()
Inference
from llmrouter.models import GraphRouter
router = GraphRouter(yaml_path="configs/model_config_test/graphrouter.yaml")
query = {"query": "Explain quantum mechanics"}
result = router.route_single(query)
print(f"Selected: {result['model_name']}")
YAML Configuration Example
data_path:
routing_data_train: 'data/example_data/routing_data/default_routing_train_data.jsonl'
query_embedding_data: 'data/example_data/routing_data/query_embeddings_longformer.pt'
llm_data: 'data/example_data/llm_candidates/default_llm.json'
model_path:
save_model_path: 'saved_models/graphrouter/graphrouter.pt'
hparam:
hidden_dim: 64
learning_rate: 0.001
weight_decay: 0.0001
train_epoch: 100
batch_size: 4
train_mask_rate: 0.3
val_split_ratio: 0.2
metric:
weights:
performance: 1
Advantages
- ✅ Relational Learning: Captures complex query-model relationships
- ✅ Graph Structure: Leverages network effects and transitivity
- ✅ Flexible: Can incorporate additional node/edge features
- ✅ Semi-Supervised: Can predict on partially observed data
Limitations
- ❌ Computational Cost: GNN training slower than simpler methods
- ❌ Graph Construction: Requires building full bipartite graph
- ❌ Cold Start: New queries/models need graph re-construction
- ❌ Hyperparameter Sensitivity: Many architectural choices
When to Use Graph Router
Good Use Cases:
- Large datasets with rich relational structure
- Query-model relationships exhibit network effects
- Have LLM embeddings or features beyond performance
- Want to model higher-order interactions
Alternatives:
- Simple relationships → Use MLP/SVM Router
- Small datasets → Use KNN Router
- Need fast training → Use ELO Router
Related Routers
- RouterDC: Also uses structured learning but with contrastive loss
- MF Router: Learns latent spaces but without graph structure
- MLP Router: Standard neural network, no graph
For questions or issues, please refer to the main LLMRouter documentation or open an issue on GitHub.