Microsoft Foundry Models Chat Application
March 16, 2026 · View on GitHub
Learning Path: Intermediate ⭐⭐ | Time: 35-45 minutes | Cost: $50-200/month
A complete Microsoft Foundry Models chat application deployed using Azure Developer CLI (azd). This example demonstrates gpt-4.1 deployment, secure API access, and a simple chat interface.
🎯 What You'll Learn
- Deploy Microsoft Foundry Models Service with gpt-4.1 model
- Secure OpenAI API keys with Key Vault
- Build a simple chat interface with Python
- Monitor token usage and costs
- Implement rate limiting and error handling
📦 What's Included
✅ Microsoft Foundry Models Service - gpt-4.1 model deployment
✅ Python Chat App - Simple command-line chat interface
✅ Key Vault Integration - Secure API key storage
✅ ARM Templates - Complete infrastructure as code
✅ Cost Monitoring - Token usage tracking
✅ Rate Limiting - Prevent quota exhaustion
Architecture
graph TD
App[Python Chat Application<br/>Local/Cloud<br/>Command-line interface<br/>Conversation history<br/>Token usage tracking] -- "HTTPS (API Key)" --> Foundry[Microsoft Foundry Models Service<br/>gpt-4.1 Model<br/>20K tokens/min capacity<br/>Multi-region failover]
Foundry --> KV[Azure Key Vault<br/>OpenAI API Key<br/>Endpoint URL]
Foundry -. Managed Identity .-> KV
Prerequisites
Required
- Azure Developer CLI (azd) - Install guide
- Azure subscription with OpenAI access - Request access
- Python 3.9+ - Install Python
Verify Prerequisites
# Check azd version (need 1.5.0 or higher)
azd version
# Verify Azure login
azd auth login
# Check Python version
python --version # or python3 --version
# Verify OpenAI access (check in Azure Portal)
az cognitiveservices account list-skus \
--kind OpenAI \
--location eastus
⚠️ Important: Microsoft Foundry Models requires application approval. If you haven't applied, visit aka.ms/oai/access. Approval typically takes 1-2 business days.
⏱️ Deployment Timeline
| Phase | Duration | What Happens |
|---|---|---|
| Prerequisites check | 2-3 minutes | Verify OpenAI quota availability |
| Deploy infrastructure | 8-12 minutes | Create OpenAI, Key Vault, model deployment |
| Configure application | 2-3 minutes | Set up environment and dependencies |
| Total | 12-18 minutes | Ready to chat with gpt-4.1 |
Note: First-time OpenAI deployment may take longer due to model provisioning.
Quick Start
# Navigate to the example
cd examples/azure-openai-chat
# Initialize environment
azd env new myopenai
# Deploy everything (infrastructure + configuration)
azd up
# You'll be prompted to:
# 1. Select Azure subscription
# 2. Choose location with OpenAI availability (e.g., eastus, eastus2, westus)
# 3. Wait 12-18 minutes for deployment
# Install Python dependencies
pip install -r requirements.txt
# Start chatting!
python chat.py
Expected Output:
🤖 Microsoft Foundry Models Chat Application
Connected to: gpt-4.1 (eastus)
Type your message (or 'quit' to exit)
You: Hello! Tell me about Microsoft Foundry Models.
Assistant: Microsoft Foundry Models Service provides REST API access to OpenAI's powerful language models including gpt-4.1, GPT-3.5-Turbo, and Embeddings...
[Tokens used: 145 | Estimated cost: \$0.0044]
✅ Verify Deployment
Step 1: Check Azure Resources
# View deployed resources
azd show
# Expected output shows:
# - OpenAI Service: (resource name)
# - Key Vault: (resource name)
# - Deployment: gpt-4.1
# - Location: eastus (or your selected region)
Step 2: Test OpenAI API
# Get OpenAI endpoint and key
OPENAI_ENDPOINT=$(azd env get-value AZURE_OPENAI_ENDPOINT)
OPENAI_KEY=$(azd env get-value AZURE_OPENAI_API_KEY)
# Test API call
curl "$OPENAI_ENDPOINT/openai/deployments/gpt-4.1/chat/completions?api-version=2024-08-01-preview" \
-H "Content-Type: application/json" \
-H "api-key: $OPENAI_KEY" \
-d '{
"messages": [{"role": "user", "content": "Say hello!"}],
"max_tokens": 50
}'
Expected Response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
}
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 9,
"total_tokens": 17
}
}
Step 3: Verify Key Vault Access
# List secrets in Key Vault
KV_NAME=$(azd env get-value AZURE_KEY_VAULT_NAME)
az keyvault secret list \
--vault-name $KV_NAME \
--query "[].name" \
--output table
Expected Secrets:
openai-api-keyopenai-endpoint
Success Criteria:
- ✅ OpenAI service deployed with gpt-4.1
- ✅ API call returns valid completion
- ✅ Secrets stored in Key Vault
- ✅ Token usage tracking works
Project Structure
azure-openai-chat/
├── README.md ✅ This guide
├── azure.yaml ✅ AZD configuration
├── infra/ ✅ Infrastructure as Code
│ ├── main.bicep ✅ Main Bicep template
│ ├── main.parameters.json ✅ Parameters
│ └── openai.bicep ✅ OpenAI resource definition
├── src/ ✅ Application code
│ ├── chat.py ✅ Chat interface
│ ├── config.py ✅ Configuration loader
│ └── requirements.txt ✅ Python dependencies
└── .gitignore ✅ Git ignore rules
Application Features
Chat Interface (chat.py)
The chat application includes:
- Conversation History - Maintains context across messages
- Token Counting - Tracks usage and estimates costs
- Error Handling - Graceful handling of rate limits and API errors
- Cost Estimation - Real-time cost calculation per message
- Streaming Support - Optional streaming responses
Commands
While chatting, you can use:
quitorexit- End the sessionclear- Clear conversation historytokens- Show total token usagecost- Show estimated total cost
Configuration (config.py)
Loads configuration from environment variables:
AZURE_OPENAI_ENDPOINT # From Key Vault
AZURE_OPENAI_API_KEY # From Key Vault
AZURE_OPENAI_MODEL # Default: gpt-4.1
AZURE_OPENAI_MAX_TOKENS # Default: 800
Usage Examples
Basic Chat
python chat.py
Chat with Custom Model
export AZURE_OPENAI_MODEL=gpt-35-turbo
python chat.py
Chat with Streaming
python chat.py --stream
Example Conversation
You: Explain Microsoft Foundry Models Service in 3 sentences.
Assistant: Microsoft Foundry Models Service is Microsoft Azure's cloud platform offering
that provides access to OpenAI's powerful language models. It enables developers
to integrate capabilities like gpt-4.1 into their applications with enterprise-grade
security and compliance. The service includes features for content filtering,
abuse monitoring, and responsible AI practices.
[Tokens used: 89 | Estimated cost: \$0.0027]
You: What models are available?
Assistant: Microsoft Foundry Models Service offers several model families including gpt-4.1
(most capable), GPT-3.5-Turbo (faster and cost-effective), and Embeddings models
for vector search. Each model has different capabilities, pricing, and token limits.
[Tokens used: 67 | Estimated cost: \$0.0020]
Total session: 156 tokens | \$0.0047
Cost Management
Token Pricing (gpt-4.1)
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
| gpt-4.1 | $0.03 | $0.06 |
| GPT-3.5-Turbo | $0.0015 | $0.002 |
Estimated Monthly Costs
Based on usage patterns:
| Usage Level | Messages/Day | Tokens/Day | Monthly Cost |
|---|---|---|---|
| Light | 20 messages | 3,000 tokens | $3-5 |
| Moderate | 100 messages | 15,000 tokens | $15-25 |
| Heavy | 500 messages | 75,000 tokens | $75-125 |
Base Infrastructure Cost: $1-2/month (Key Vault + minimal compute)
Cost Optimization Tips
# 1. Use GPT-3.5-Turbo for simpler tasks (20x cheaper)
export AZURE_OPENAI_MODEL=gpt-35-turbo
# 2. Reduce max tokens for shorter responses
export AZURE_OPENAI_MAX_TOKENS=400
# 3. Monitor token usage
python chat.py --show-tokens
# 4. Set up budget alerts
az consumption budget create \
--budget-name "openai-budget" \
--amount 50 \
--time-grain Monthly
Monitoring
View Token Usage
# In Azure Portal:
# OpenAI Resource → Metrics → Select "Token Transaction"
# Or via Azure CLI:
az monitor metrics list \
--resource $(azd env get-value AZURE_OPENAI_RESOURCE_ID) \
--metric "TokenTransaction" \
--start-time $(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%S') \
--interval PT1M
View API Logs
# Stream diagnostic logs
az monitor diagnostic-settings create \
--resource $(azd env get-value AZURE_OPENAI_RESOURCE_ID) \
--name openai-logs \
--logs '[{"category": "Audit", "enabled": true}]' \
--workspace $(azd env get-value LOG_ANALYTICS_WORKSPACE_ID)
# Query logs
az monitor log-analytics query \
--workspace $(azd env get-value LOG_ANALYTICS_WORKSPACE_ID) \
--analytics-query "AzureDiagnostics | where Category == 'Audit' | top 10 by TimeGenerated"
Troubleshooting
Issue: "Access Denied" Error
Symptoms: 403 Forbidden when calling API
Solutions:
# 1. Verify OpenAI access is approved
az cognitiveservices account show \
--name $(azd env get-value AZURE_OPENAI_NAME) \
--resource-group $(azd env get-value AZURE_RESOURCE_GROUP)
# 2. Check API key is correct
azd env get-value AZURE_OPENAI_API_KEY
# 3. Verify endpoint URL format
azd env get-value AZURE_OPENAI_ENDPOINT
# Should be: https://[name].openai.azure.com/
Issue: "Rate Limit Exceeded"
Symptoms: 429 Too Many Requests
Solutions:
# 1. Check current quota
az cognitiveservices account deployment show \
--name $(azd env get-value AZURE_OPENAI_NAME) \
--resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
--deployment-name gpt-4.1
# 2. Request quota increase (if needed)
# Go to Azure Portal → OpenAI Resource → Quotas → Request Increase
# 3. Implement retry logic (already in chat.py)
# The application automatically retries with exponential backoff
Issue: "Model Not Found"
Symptoms: 404 error for deployment
Solutions:
# 1. List available deployments
az cognitiveservices account deployment list \
--name $(azd env get-value AZURE_OPENAI_NAME) \
--resource-group $(azd env get-value AZURE_RESOURCE_GROUP)
# 2. Verify model name in environment
echo $AZURE_OPENAI_MODEL
# 3. Update to correct deployment name
export AZURE_OPENAI_MODEL=gpt-4.1 # or gpt-35-turbo
Issue: High Latency
Symptoms: Slow response times (>5 seconds)
Solutions:
# 1. Check regional latency
# Deploy to region closest to users
# 2. Reduce max_tokens for faster responses
export AZURE_OPENAI_MAX_TOKENS=400
# 3. Use streaming for better UX
python chat.py --stream
Security Best Practices
1. Protect API Keys
# Never commit keys to source control
# Use Key Vault (already configured)
# Rotate keys regularly
az cognitiveservices account keys regenerate \
--name $(azd env get-value AZURE_OPENAI_NAME) \
--resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
--key-name key1
2. Implement Content Filtering
# Microsoft Foundry Models includes built-in content filtering
# Configure in Azure Portal:
# OpenAI Resource → Content Filters → Create Custom Filter
# Categories: Hate, Sexual, Violence, Self-harm
# Levels: Low, Medium, High filtering
3. Use Managed Identity (Production)
# For production deployments, use managed identity
# instead of API keys (requires app hosting on Azure)
# Update infra/openai.bicep to include:
# identity: { type: 'SystemAssigned' }
Development
Run Locally
# Install dependencies
pip install -r src/requirements.txt
# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://[name].openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_MODEL="gpt-4.1"
# Run application
python src/chat.py
Run Tests
# Install test dependencies
pip install pytest pytest-cov
# Run tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=src --cov-report=html
Update Model Deployment
# Deploy different model version
az cognitiveservices account deployment create \
--name $(azd env get-value AZURE_OPENAI_NAME) \
--resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
--deployment-name gpt-35-turbo \
--model-name gpt-35-turbo \
--model-version "0613" \
--model-format OpenAI \
--sku-capacity 20 \
--sku-name "Standard"
Clean Up
# Delete all Azure resources
azd down --force --purge
# This removes:
# - OpenAI Service
# - Key Vault (with 90-day soft delete)
# - Resource Group
# - All deployments and configurations
Next Steps
Expand This Example
-
Add Web Interface - Build React/Vue frontend
# Add frontend service to azure.yaml # Deploy to Azure Static Web Apps -
Implement RAG - Add document search with Azure AI Search
# Integrate Azure Cognitive Search # Upload documents and create vector index -
Add Function Calling - Enable tool use
# Define functions in chat.py # Let gpt-4.1 call external APIs -
Multi-Model Support - Deploy multiple models
# Add gpt-35-turbo, embeddings models # Implement model routing logic
Related Examples
- Retail Multi-Agent - Advanced multi-agent architecture
- Database App - Add persistent storage
- Container Apps - Deploy as containerized service
Learning Resources
- 📚 AZD For Beginners Course - Main course home
- 📚 Microsoft Foundry Models Documentation - Official docs
- 📚 OpenAI API Reference - API details
- 📚 Responsible AI - Best practices
Additional Resources
Documentation
- Microsoft Foundry Models Service - Complete guide
- gpt-4.1 Models - Model capabilities
- Content Filtering - Safety features
- Azure Developer CLI - azd reference
Tutorials
- OpenAI Quickstart - First deployment
- Chat Completions - Building chat apps
- Function Calling - Advanced features
Tools
- Microsoft Foundry Models Studio - Web-based playground
- Prompt Engineering Guide - Writing better prompts
- Token Calculator - Estimate token usage
Community
- Azure AI Discord - Get help from community
- GitHub Discussions - Q&A forum
- Azure Blog - Latest updates
🎉 Success! You've deployed Microsoft Foundry Models and built a working chat application. Start exploring gpt-4.1's capabilities and experiment with different prompts and use cases.
Questions? Open an issue or check the FAQ
Cost Alert: Remember to run azd down when done testing to avoid ongoing charges (~$50-100/month for active usage).