KNN Multi-Round Router

December 29, 2025 · View on GitHub

Overview

The KNN Multi-Round Router extends the standard KNN router with a multi-round pipeline: it decomposes complex queries into sub-queries, routes each sub-query using KNN, executes them with the routed models, and aggregates responses into a final answer.

Paper Reference

This router implements multi-round routing as described in:

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
- Zhang, H., Feng, T., & You, J. (2025). arXiv:2506.09033.
- Proposes multi-round routing with decomposition and aggregation.

Combines K-Nearest Neighbors with query decomposition:

KNN: Instance-based learning, no training required
Query Decomposition: Break complex queries into simpler sub-tasks
Multi-Agent: Delegate sub-queries to specialized models

How It Works

Architecture

Query → Decomposition → [Sub-Query 1, Sub-Query 2, ...] 
           ↓               ↓ (KNN Route)      ↓ (KNN Route)
      Base LLM          Model A Execute    Model B Execute
           ↓                     ↓                 ↓
       Aggregation ← [Response 1, Response 2, ...]
           ↓
    Final Answer

Pipeline

Decomposition: Local LLM breaks query into 1-4 sub-queries
Routing: Each sub-query routed via KNN to best-matching model
Execution: Sub-queries executed with routed models via API
Aggregation: Base LLM combines sub-responses into final answer

Configuration Parameters

KNN Hyperparameters (`hparam`)

Parameter	Type	Default	Description
`n_neighbors`	int	`5`	Number of nearest neighbors
`weights`	str	`"distance"`	Weight function: `"uniform"` or `"distance"`
`metric`	str	`"cosine"`	Distance metric for KNN

Multi-Round Configuration

Parameter	Type	Default	Description
`base_model`	str	`"Qwen/Qwen2.5-3B-Instruct"`	Base model for decomposition/aggregation
`use_local_llm`	bool	`false`	Use local vLLM (true) or API (false)
`api_endpoint`	str	-	API endpoint for sub-query execution

CLI Usage

The KNN Multi-Round Router can be used via the llmrouter command-line interface:

Training

# Train the KNN Multi-Round router (builds KNN index)
llmrouter train --router knnmultiroundrouter --config configs/model_config_train/knnmultiroundrouter.yaml

# Train with quiet mode
llmrouter train --router knnmultiroundrouter --config configs/model_config_train/knnmultiroundrouter.yaml --quiet

Inference

# Route a single query with decomposition
llmrouter infer --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml \
    --query "Explain climate change and its causes"

# Route queries from a file
llmrouter infer --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml \
    --input queries.jsonl --output results.json

# Route only (without calling LLM API)
llmrouter infer --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml \
    --query "What causes earthquakes?" --route-only

Interactive Chat

# Launch chat interface
llmrouter chat --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml

# Launch with custom port
llmrouter chat --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml --port 8080

# Create a public shareable link
llmrouter chat --router knnmultiroundrouter --config configs/model_config_test/knnmultiroundrouter.yaml --share

Usage Examples

Inference (Chat Mode)

from llmrouter.models import KNNMultiRoundRouter

router = KNNMultiRoundRouter(yaml_path="configs/model_config_test/knnmultiroundrouter.yaml")

# Simple string query returns response only
response = router.route_single("Explain climate change and its causes")
print(response)

Inference (Evaluation Mode)

# Dict query returns full metrics
query = {
    "query": "What causes earthquakes and how are they measured?",
    "ground_truth": "...",
    "task_name": "general"
}
result = router.route_single(query)

print(f"Response: {result['response']}")
print(f"Tokens: {result['prompt_tokens'] + result['completion_tokens']}")
print(f"Performance: {result.get('task_performance', 'N/A')}")

Advantages

✅ No Training: KNN requires no training, just load data
✅ Decomposition: Handles complex multi-faceted queries
✅ Specialized Routing: Each sub-query gets optimal model
✅ Flexible: Supports both local and API-based execution

Limitations

❌ High Latency: Multiple API calls increase response time
❌ High Cost: Decomposition + routing + aggregation tokens
❌ Complexity: More moving parts than simple routing
❌ Local LLM Option: Requires vLLM and GPU if use_local_llm=true

When to Use

Good For:

Complex queries requiring multi-step reasoning
Diverse sub-tasks benefiting from specialized models
Have training data for KNN routing

Alternatives:

Simple queries → Standard KNN Router
No decomposition needed → Single-round routers
Need LLM-based decomposition → LLM Multi-Round Router

LLM Multi-Round Router: Uses LLM for routing instead of KNN
KNN Router: Single-round KNN without decomposition
Router-R1: Agentic multi-round with different approach

For questions or issues, please refer to the main LLMRouter documentation or open an issue on GitHub.