06.ModelsAsTools.md

January 29, 2026 · View on GitHub

अवलोकन

Foundry Local को प्रयोग गरेर AI मोडेलहरूलाई मोड्युलर, अनुकूलन योग्य उपकरणको रूपमा व्यवहार गर्नुहोस्, जसले उपकरणमै सिधै चल्छ। यो सत्रले गोपनीयता सुरक्षित गर्ने, कम विलम्बता भएको इनफरेन्सका लागि व्यावहारिक कार्यप्रवाहहरूमा जोड दिन्छ र SDKs, APIs, वा CLI मार्फत यी उपकरणहरूलाई कसरी एकीकृत गर्ने भन्ने सिकाउँछ। तपाईंले आवश्यक परेमा Azure AI Foundry मा स्केल गर्ने तरिका पनि सिक्नुहुनेछ।

🔄 आधुनिक SDK को लागि अद्यावधिक गरिएको: यो मोड्युललाई पछिल्लो Microsoft Foundry-Local रिपोजिटरी ढाँचाहरूमा समायोजन गरिएको छ र samples/06/ मा रहेको बौद्धिक राउटिङ कार्यान्वयनसँग मेल खान्छ। उदाहरणहरूले अब आधुनिक foundry-local-sdk र उन्नत मोडेल चयन रणनीतिहरू प्रयोग गर्छन्।

🏗️ आर्किटेक्चरका मुख्य बुँदाहरू:

बौद्धिक मोडेल राउटिङ: सामान्य, तर्क, कोड, र सिर्जनात्मक मोडेलहरू बीच कुञ्जीशब्द-आधारित चयन
आधुनिक SDK एकीकरण: स्वचालित सेवा पत्ता लगाउने सुविधा भएको FoundryLocalManager प्रयोग गर्दछ
पर्यावरण कन्फिगरेसन: वातावरण चरहरू मार्फत लचिलो मोडेल असाइनमेन्ट
स्वास्थ्य अनुगमन: सेवा प्रमाणीकरण र मोडेल उपलब्धता जाँच
उत्पादनको लागि तयार: व्यापक त्रुटि ह्यान्डलिङ र फलब्याक संयन्त्रहरू

📁 स्थानीय कार्यान्वयन:

samples/06/router.py - कुञ्जीशब्द-आधारित चयनसहितको बौद्धिक मोडेल राउटर
samples/06/model_router.ipynb - अन्तरक्रियात्मक उदाहरणहरू र बेंचमार्कहरू
samples/06/README.md - कन्फिगरेसन र प्रयोग निर्देशहरू

सन्दर्भहरू:

Foundry Local डकहरू: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/
इनफरेन्स SDKs सँग एकीकृत गर्नुहोस्: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/how-to/how-to-integrate-with-inference-sdks
Hugging Face मोडेलहरू कम्पाइल गर्नुहोस्: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models

अवलोकन

सन्दर्भहरू:

Foundry Local डकहरू: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/
इनफरेन्स SDKs सँग एकीकृत गर्नुहोस्: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/how-to/how-to-integrate-with-inference-sdks
Hugging Face मोडेलहरू कम्पाइल गर्नुहोस्: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models

सिक्ने उद्देश्यहरू

उपकरणको रूपमा मोडेल डिजाइन गर्ने ढाँचाहरू उपकरणमै बनाउनुहोस्
OpenAI-सँग मिल्ने REST API वा SDKs मार्फत एकीकृत गर्नुहोस्
मोडेलहरूलाई डोमेन-विशिष्ट प्रयोगका लागि अनुकूलन गर्नुहोस्
Azure AI Foundry मा हाइब्रिड स्केलिङको योजना बनाउनुहोस्

भाग १: बौद्धिक मोडेल राउटर (आधुनिक कार्यान्वयन)

उद्देश्य: सोधपुछ सामग्रीको आधारमा स्वचालित राउटिङसहित बौद्धिक मोडेल चयन कार्यान्वयन गर्नुहोस्।

📋 नोट: यो कार्यान्वयन samples/06/router.py मा प्रयोग गरिएको ढाँचासँग मेल खान्छ, जसमा उन्नत कुञ्जीशब्द-आधारित मोडेल चयन छ।

चरण १) FoundryLocalManager सँग आधुनिक मोडेल राउटर परिभाषित गर्नुहोस्

# router/intelligent_router.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
from typing import Dict, Any, Optional
import os
import json

class ModelRouter:
    """Intelligent model router that selects appropriate models for different task types."""
    
    def __init__(self):
        self.client = None
        self.base_url = None
        self.tools = self._load_tool_registry()
        self._initialize_client()
    
    def _load_tool_registry(self) -> Dict[str, Dict[str, Any]]:
        """Load tool registry from environment or use defaults."""
        default_tools = {
            "general": {
                "model": os.environ.get("GENERAL_MODEL", "phi-4-mini"),
                "notes": "Fast general-purpose chat and Q&A",
                "temperature": 0.7
            },
            "reasoning": {
                "model": os.environ.get("REASONING_MODEL", "deepseek-r1-7b"),
                "notes": "Step-by-step analysis and logical reasoning",
                "temperature": 0.3
            },
            "code": {
                "model": os.environ.get("CODE_MODEL", "qwen2.5-7b"),
                "notes": "Code generation, debugging, and technical tasks",
                "temperature": 0.2
            },
            "creative": {
                "model": os.environ.get("CREATIVE_MODEL", "phi-4-mini"),
                "notes": "Creative writing and storytelling",
                "temperature": 0.9
            }
        }
        
        # Check for environment override
        tools_env = os.environ.get("TOOL_REGISTRY")
        if tools_env:
            try:
                return json.loads(tools_env)
            except json.JSONDecodeError:
                print("Warning: Invalid TOOL_REGISTRY JSON, using defaults")
        
        return default_tools

चरण २) आधुनिक SDK र सेवा पत्ता लगाउने सुविधा सहित क्लाइन्ट सुरु गर्नुहोस्

    def _initialize_client(self):
        """Initialize OpenAI client with Foundry Local or fallback configuration."""
        try:
            from foundry_local import FoundryLocalManager
            # Try to use any available model for client initialization
            first_model = next(iter(self.tools.values()))["model"]
            manager = FoundryLocalManager(first_model)
            
            self.client = OpenAI(
                base_url=manager.endpoint,
                api_key=manager.api_key
            )
            self.base_url = manager.endpoint
            print(f"✅ Foundry Local SDK initialized")
        except Exception as e:
            print(f"Warning: Could not use Foundry SDK ({e}), falling back to manual configuration")
            # Fallback to manual configuration
            self.base_url = os.environ.get("BASE_URL", "http://localhost:8000")
            api_key = os.environ.get("API_KEY", "")
            
            self.client = OpenAI(
                base_url=f"{self.base_url}/v1",
                api_key=api_key
            )
            print(f"Initialized manual configuration at {self.base_url}")
    
    def select_tool(self, user_query: str) -> str:
        """Select the most appropriate tool based on the user query."""
        query_lower = user_query.lower()
        
        # Code-related keywords
        code_keywords = ["code", "python", "function", "class", "method", "bug", "debug", 
                        "programming", "script", "algorithm", "implementation", "refactor"]
        if any(keyword in query_lower for keyword in code_keywords):
            return "code"
        
        # Reasoning keywords
        reasoning_keywords = ["why", "how", "explain", "step-by-step", "reason", "analyze", 
                             "think", "logic", "because", "cause", "compare", "evaluate"]
        if any(keyword in query_lower for keyword in reasoning_keywords):
            return "reasoning"
        
        # Creative keywords
        creative_keywords = ["story", "poem", "creative", "imagine", "write", "tale", 
                           "narrative", "fiction", "character", "plot"]
        if any(keyword in query_lower for keyword in creative_keywords):
            return "creative"
        
        # Default to general
        return "general"
    
    def chat(self, model: str, content: str, max_tokens: int = 300, temperature: Optional[float] = None) -> str:
        """Send chat completion request to the specified model."""
        try:
            params = {
                "model": model,
                "messages": [{"role": "user", "content": content}],
                "max_tokens": max_tokens
            }
            
            if temperature is not None:
                params["temperature"] = temperature
            
            response = self.client.chat.completions.create(**params)
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response with model {model}: {str(e)}"

चरण ३) बौद्धिक राउटिङ र कार्यान्वयन गर्नुहोस् (samples/06/router.py हेर्नुहोस्)

    def route_and_run(self, prompt: str) -> Dict[str, Any]:
        """Route the prompt to the appropriate model and generate response."""
        tool_key = self.select_tool(prompt)
        tool_config = self.tools[tool_key]
        model = tool_config["model"]
        temperature = tool_config.get("temperature", 0.7)
        
        print(f"🎯 Selected tool: {tool_key} (model: {model})")
        
        answer = self.chat(
            model=model, 
            content=prompt, 
            max_tokens=400, 
            temperature=temperature
        )
        
        return {
            "tool": tool_key,
            "model": model,
            "tool_description": tool_config["notes"],
            "temperature": temperature,
            "answer": answer
        }
    
    def check_service_health(self) -> Dict[str, Any]:
        """Check Foundry Local service health and available models."""
        try:
            models_response = self.client.models.list()
            available_models = [model.id for model in models_response.data]
            
            return {
                "status": "healthy",
                "base_url": self.base_url,
                "available_models": available_models,
                "tools_configured": list(self.tools.keys())
            }
        except Exception as e:
            return {
                "status": "error",
                "base_url": self.base_url,
                "error": str(e)
            }

if __name__ == "__main__":
    # Ensure: foundry model run phi-4-mini
    router = ModelRouter()
    
    # Check health
    health = router.check_service_health()
    print(f"Service Health: {json.dumps(health, indent=2)}")
    
    # Test different query types
    queries = [
        "Write a Python function to calculate fibonacci numbers",  # -> code
        "Explain step-by-step why the sky is blue",  # -> reasoning
        "Tell me a creative story about AI",  # -> creative
        "What's the weather like today?"  # -> general
    ]
    
    for query in queries:
        result = router.route_and_run(query)
        print(f"\nQuery: {query}")
        print(f"Selected: {result['tool']} -> {result['model']}")
        print(f"Answer: {result['answer'][:100]}...")

भाग २: आधुनिक SDK एकीकरण (चरण-दर-चरण)

उद्देश्य: OpenAI Python SDK सँग Foundry Local SDK प्रयोग गरेर सहज एकीकरण गर्नुहोस्।

चरण १) निर्भरता स्थापना गर्नुहोस्

cd Module08
.\.venv\Scripts\activate
pip install foundry-local-sdk openai

चरण २) वातावरण कन्फिगर गर्नुहोस् (वैकल्पिक - samples/06/README.md हेर्नुहोस्)

REM Override default models per tool
set GENERAL_MODEL=phi-4-mini
set REASONING_MODEL=deepseek-r1-7b
set CODE_MODEL=qwen2.5-7b
REM Or provide a full JSON registry
set TOOL_REGISTRY={"general":{"model":"phi-4-mini"},"reasoning":{"model":"deepseek-r1-7b"}}

चरण ३) आधुनिक SDK एकीकरण गर्नुहोस्

# modern_sdk_demo.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
import sys

def main():
    """Demonstrate modern SDK integration."""
    try:
        # Initialize with FoundryLocalManager
        alias = "phi-4-mini"
        manager = FoundryLocalManager(alias)
        
        # Create OpenAI client using Foundry Local endpoint
        client = OpenAI(
            base_url=manager.endpoint,
            api_key=manager.api_key
        )
        
        # Get model info
        model_info = manager.get_model_info(alias)
        print(f"Using model: {model_info.id}")
        
        # Make request with streaming
        stream = client.chat.completions.create(
            model=model_info.id,
            messages=[{"role": "user", "content": "Explain edge AI benefits in one paragraph."}],
            stream=True,
            max_tokens=200
        )
        
        print("Response: ", end="")
        for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)
        print()
        
    except Exception as e:
        print(f"Error: {e}")
        print("Ensure Foundry Local is running with: foundry model run phi-4-mini")
        sys.exit(1)

if __name__ == "__main__":
    main()

भाग ३: डोमेन अनुकूलन (चरण-दर-चरण)

उद्देश्य: प्रम्प्ट टेम्प्लेटहरू र JSON स्किमाको प्रयोग गरेर डोमेनका लागि आउटपुटहरू अनुकूलन गर्नुहोस्।

चरण १) डोमेन प्रम्प्ट टेम्प्लेट बनाउनुहोस्

# domain/templates.py
BUSINESS_ANALYST_SYSTEM = """
You are a senior business analyst. Provide:
1) Key insights
2) Risks
3) Next steps
Respond in valid JSON with fields: insights, risks, next_steps.
"""

चरण २) JSON आउटपुट लागू गर्नुहोस्

# domain/analyst.py
import requests, os, json

BASE_URL = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")
API_KEY = os.getenv("OPENAI_API_KEY", "local-key")
HEADERS = {"Content-Type":"application/json","Authorization":f"Bearer {API_KEY}"}

from domain.templates import BUSINESS_ANALYST_SYSTEM

def analyze(text: str) -> dict:
    messages = [
        {"role":"system","content": BUSINESS_ANALYST_SYSTEM},
        {"role":"user","content": f"Analyze this business text:\n{text}"}
    ]
    r = requests.post(f"{BASE_URL}/chat/completions", json={
    "model":"phi-4-mini",
        "messages": messages,
        "response_format": {"type":"json_object"},
        "temperature": 0.3
    }, headers=HEADERS, timeout=60)
    r.raise_for_status()
    # Parse JSON content
    content = r.json()["choices"][0]["message"]["content"]
    return json.loads(content)

if __name__ == "__main__":
    print(analyze("Sales dipped 12% in Q3 due to supply constraints and marketing cuts."))

भाग ४: अफलाइन र सुरक्षा स्थिति (चरण-दर-चरण)

उद्देश्य: मोडेलहरूलाई स्थानीय रूपमा उपकरणको रूपमा चलाउँदा गोपनीयता र लचिलोपन सुनिश्चित गर्नुहोस्।

चरण १) स्थानीय अन्त बिन्दुलाई प्रि-वार्म र प्रमाणीकरण गर्नुहोस्

foundry model run phi-4-mini
curl http://localhost:8000/v1/models

चरण २) इनपुटहरूलाई सफा गर्नुहोस्

# security/sanitize.py
import re
EMAIL_RE = re.compile(r"[\w\.-]+@[\w\.-]+")
PHONE_RE = re.compile(r"\+?\d[\d\s\-]{7,}\d")

def sanitize(text: str) -> str:
    text = EMAIL_RE.sub("[REDACTED_EMAIL]", text)
    text = PHONE_RE.sub("[REDACTED_PHONE]", text)
    return text

चरण ३) स्थानीय-मात्र झण्डा र लगिङ

# security/local_only.py
import os, json, time
LOG = os.getenv("MODELS_AS_TOOLS_LOG", "./tools_logs.jsonl")

def record(event: dict):
    with open(LOG, "a", encoding="utf-8") as f:
        f.write(json.dumps(event) + "\n")

# Usage before each call
def before_call(tool_name, payload):
    record({"ts": time.time(), "tool": tool_name, "event": "before_call"})

# After each call
def after_call(tool_name, result):
    record({"ts": time.time(), "tool": tool_name, "event": "after_call"})

भाग ५: उत्पादन परिनियोजन र स्केलिङ

उद्देश्य: अनुगमन र Azure AI Foundry एकीकरणसहित बौद्धिक राउटर परिनियोजन गर्नुहोस्।

📋 नोट: samples/06/model_router.ipynb मा स्थानीय कार्यान्वयनले उत्पादन परिनियोजन ढाँचाहरूका व्यापक उदाहरणहरू समावेश गर्दछ।

चरण १) अनुगमनसहित उत्पादन राउटर (samples/06/router.py हेर्नुहोस्)

# production/router.py
from router.intelligent_router import ModelRouter
import json
import time
import sys

class ProductionModelRouter(ModelRouter):
    """Production-ready model router with monitoring and logging."""
    
    def __init__(self):
        super().__init__()
        self.request_count = 0
        self.error_count = 0
        self.start_time = time.time()
    
    def route_and_run_with_monitoring(self, prompt: str) -> Dict[str, Any]:
        """Route with comprehensive monitoring and error handling."""
        start_time = time.time()
        self.request_count += 1
        
        try:
            result = self.route_and_run(prompt)
            processing_time = time.time() - start_time
            
            # Log successful request
            self._log_request({
                "status": "success",
                "tool": result["tool"],
                "model": result["model"],
                "processing_time": processing_time,
                "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
            })
            
            result["processing_time"] = processing_time
            return result
            
        except Exception as e:
            self.error_count += 1
            error_result = {
                "status": "error",
                "error": str(e),
                "processing_time": time.time() - start_time,
                "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
            }
            
            self._log_request(error_result)
            return error_result
    
    def _log_request(self, data: Dict[str, Any]):
        """Log request data for monitoring."""
        print(f"📊 {json.dumps(data)}")
    
    def get_stats(self) -> Dict[str, Any]:
        """Get router statistics."""
        uptime = time.time() - self.start_time
        return {
            "uptime_seconds": uptime,
            "total_requests": self.request_count,
            "error_count": self.error_count,
            "success_rate": (self.request_count - self.error_count) / max(1, self.request_count),
            "requests_per_minute": self.request_count / max(1, uptime / 60)
        }

def main():
    """Production router demo."""
    router = ProductionModelRouter()
    
    # Health check
    health = router.check_service_health()
    if health["status"] == "error":
        print(f"❌ Service health check failed: {health['error']}")
        sys.exit(1)
    
    print(f"✅ Service healthy with {len(health['available_models'])} models")
    
    # Process user query
    user_prompt = " ".join(sys.argv[1:]) or "Write three benefits of on-device AI in JSON format."
    print(f"\n🎯 Processing: {user_prompt}")
    
    result = router.route_and_run_with_monitoring(user_prompt)
    
    if result.get("status") == "error":
        print(f"❌ Error: {result['error']}")
    else:
        print(f"\n📋 Result:")
        print(f"Tool: {result['tool']} -> Model: {result['model']}")
        print(f"Processing Time: {result['processing_time']:.2f}s")
        print(f"Answer: {result['answer']}")
    
    # Show stats
    stats = router.get_stats()
    print(f"\n📊 Statistics: {json.dumps(stats, indent=2)}")

if __name__ == "__main__":
    main()

व्यावहारिक चेकलिस्ट

कुञ्जीशब्द-आधारित चयनसहित बौद्धिक मोडेल राउटर कार्यान्वयन गर्नुहोस् (samples/06/router.py)
धेरै विशेष मोडेलहरू कन्फिगर गर्नुहोस् (सामान्य, तर्क, कोड, सिर्जनात्मक)
अन्तरक्रियात्मक Jupyter नोटबुक परीक्षण गर्नुहोस् (samples/06/model_router.ipynb)
वातावरण-आधारित मोडेल कन्फिगरेसन सेट गर्नुहोस्
सेवा स्वास्थ्य अनुगमन र त्रुटि ह्यान्डलिङ कार्यान्वयन गर्नुहोस्
व्यापक लगिङसहित उत्पादन राउटर परिनियोजन गर्नुहोस्

स्थानीय नमूना एकीकरण

पूर्ण कार्यान्वयन चलाउनुहोस्:

cd Module08
.\.venv\Scripts\activate

REM Start required models
foundry model run phi-4-mini
foundry model run qwen2.5-7b
foundry model run deepseek-r1-7b

REM Test the intelligent router
python samples\06\router.py "Write a Python function to sort a list"
python samples\06\router.py "Explain step-by-step how bubble sort works"
python samples\06\router.py "Tell me a creative story about robots"

REM Explore the interactive notebook
jupyter notebook samples/06/model_router.ipynb

सन्दर्भहरू र आगामी कदमहरू

स्थानीय कार्यान्वयन: samples/06/ - धेरै मोडेल समर्थनसहितको पूर्ण बौद्धिक राउटर
Microsoft नमूनाहरू: Hello Foundry Local
एकीकरण डकहरू: इनफरेन्स SDKs सँग एकीकृत गर्नुहोस्
उन्नत ढाँचाहरू: मोड्युल ५ मा फङ्सन कलिङ र मल्टि-एजेन्ट अर्केस्ट्रेसन अन्वेषण गर्नुहोस्

समापन

Foundry Local ले उपकरणमै चल्ने बलियो AI सक्षम बनाउँछ, जहाँ मोडेलहरू बौद्धिक, विशेष उपकरणहरू बन्छन्। स्वचालित मोडेल चयन, व्यापक अनुगमन, र उत्पादन-तयार ढाँचाहरूको साथ, टोलीहरूले गोपनीयता र प्रदर्शन कायम राख्दै विभिन्न कार्य प्रकारहरूमा अनुकूल हुने परिष्कृत AI अनुप्रयोगहरू निर्माण गर्न सक्छन्। यहाँ प्रदर्शन गरिएको बौद्धिक राउटर ढाँचाले स्थानीय विकासदेखि उत्पादन परिनियोजनसम्म स्केल गर्न सक्ने जटिल AI प्रणालीहरू निर्माणको लागि आधार प्रदान गर्दछ।

अस्वीकरण:
यो दस्तावेज़ AI अनुवाद सेवा Co-op Translator प्रयोग गरेर अनुवाद गरिएको हो। हामी यथार्थताको लागि प्रयास गर्छौं, तर कृपया ध्यान दिनुहोस् कि स्वचालित अनुवादमा त्रुटिहरू वा अशुद्धताहरू हुन सक्छ। यसको मूल भाषा मा रहेको मूल दस्तावेज़लाई आधिकारिक स्रोत मानिनुपर्छ। महत्वपूर्ण जानकारीको लागि, व्यावसायिक मानव अनुवाद सिफारिस गरिन्छ। यस अनुवादको प्रयोगबाट उत्पन्न हुने कुनै पनि गलतफहमी वा गलत व्याख्याको लागि हामी जिम्मेवार हुने छैनौं।