Microsoft Foundry Models Chat Application

March 16, 2026 · View on GitHub

Learning Path: Intermediate ⭐⭐ | Time: 35-45 minutes | Cost: $50-200/month

A complete Microsoft Foundry Models chat application deployed using Azure Developer CLI (azd). This example demonstrates gpt-4.1 deployment, secure API access, and a simple chat interface.

🎯 What You'll Learn

  • Deploy Microsoft Foundry Models Service with gpt-4.1 model
  • Secure OpenAI API keys with Key Vault
  • Build a simple chat interface with Python
  • Monitor token usage and costs
  • Implement rate limiting and error handling

📦 What's Included

Microsoft Foundry Models Service - gpt-4.1 model deployment
Python Chat App - Simple command-line chat interface
Key Vault Integration - Secure API key storage
ARM Templates - Complete infrastructure as code
Cost Monitoring - Token usage tracking
Rate Limiting - Prevent quota exhaustion

Architecture

graph TD
    App[Python Chat Application<br/>Local/Cloud<br/>Command-line interface<br/>Conversation history<br/>Token usage tracking] -- "HTTPS (API Key)" --> Foundry[Microsoft Foundry Models Service<br/>gpt-4.1 Model<br/>20K tokens/min capacity<br/>Multi-region failover]
    Foundry --> KV[Azure Key Vault<br/>OpenAI API Key<br/>Endpoint URL]
    Foundry -. Managed Identity .-> KV

Prerequisites

Required

Verify Prerequisites

# Check azd version (need 1.5.0 or higher)
azd version

# Verify Azure login
azd auth login

# Check Python version
python --version  # or python3 --version

# Verify OpenAI access (check in Azure Portal)
az cognitiveservices account list-skus \
  --kind OpenAI \
  --location eastus

⚠️ Important: Microsoft Foundry Models requires application approval. If you haven't applied, visit aka.ms/oai/access. Approval typically takes 1-2 business days.

⏱️ Deployment Timeline

PhaseDurationWhat Happens
Prerequisites check2-3 minutesVerify OpenAI quota availability
Deploy infrastructure8-12 minutesCreate OpenAI, Key Vault, model deployment
Configure application2-3 minutesSet up environment and dependencies
Total12-18 minutesReady to chat with gpt-4.1

Note: First-time OpenAI deployment may take longer due to model provisioning.

Quick Start

# Navigate to the example
cd examples/azure-openai-chat

# Initialize environment
azd env new myopenai

# Deploy everything (infrastructure + configuration)
azd up
# You'll be prompted to:
# 1. Select Azure subscription
# 2. Choose location with OpenAI availability (e.g., eastus, eastus2, westus)
# 3. Wait 12-18 minutes for deployment

# Install Python dependencies
pip install -r requirements.txt

# Start chatting!
python chat.py

Expected Output:

🤖 Microsoft Foundry Models Chat Application
Connected to: gpt-4.1 (eastus)
Type your message (or 'quit' to exit)

You: Hello! Tell me about Microsoft Foundry Models.
Assistant: Microsoft Foundry Models Service provides REST API access to OpenAI's powerful language models including gpt-4.1, GPT-3.5-Turbo, and Embeddings...

[Tokens used: 145 | Estimated cost: \$0.0044]

✅ Verify Deployment

Step 1: Check Azure Resources

# View deployed resources
azd show

# Expected output shows:
# - OpenAI Service: (resource name)
# - Key Vault: (resource name)
# - Deployment: gpt-4.1
# - Location: eastus (or your selected region)

Step 2: Test OpenAI API

# Get OpenAI endpoint and key
OPENAI_ENDPOINT=$(azd env get-value AZURE_OPENAI_ENDPOINT)
OPENAI_KEY=$(azd env get-value AZURE_OPENAI_API_KEY)

# Test API call
curl "$OPENAI_ENDPOINT/openai/deployments/gpt-4.1/chat/completions?api-version=2024-08-01-preview" \
  -H "Content-Type: application/json" \
  -H "api-key: $OPENAI_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "Say hello!"}],
    "max_tokens": 50
  }'

Expected Response:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 9,
    "total_tokens": 17
  }
}

Step 3: Verify Key Vault Access

# List secrets in Key Vault
KV_NAME=$(azd env get-value AZURE_KEY_VAULT_NAME)

az keyvault secret list \
  --vault-name $KV_NAME \
  --query "[].name" \
  --output table

Expected Secrets:

  • openai-api-key
  • openai-endpoint

Success Criteria:

  • ✅ OpenAI service deployed with gpt-4.1
  • ✅ API call returns valid completion
  • ✅ Secrets stored in Key Vault
  • ✅ Token usage tracking works

Project Structure

azure-openai-chat/
├── README.md                   ✅ This guide
├── azure.yaml                  ✅ AZD configuration
├── infra/                      ✅ Infrastructure as Code
│   ├── main.bicep             ✅ Main Bicep template
│   ├── main.parameters.json   ✅ Parameters
│   └── openai.bicep           ✅ OpenAI resource definition
├── src/                        ✅ Application code
│   ├── chat.py                ✅ Chat interface
│   ├── config.py              ✅ Configuration loader
│   └── requirements.txt       ✅ Python dependencies
└── .gitignore                  ✅ Git ignore rules

Application Features

Chat Interface (chat.py)

The chat application includes:

  • Conversation History - Maintains context across messages
  • Token Counting - Tracks usage and estimates costs
  • Error Handling - Graceful handling of rate limits and API errors
  • Cost Estimation - Real-time cost calculation per message
  • Streaming Support - Optional streaming responses

Commands

While chatting, you can use:

  • quit or exit - End the session
  • clear - Clear conversation history
  • tokens - Show total token usage
  • cost - Show estimated total cost

Configuration (config.py)

Loads configuration from environment variables:

AZURE_OPENAI_ENDPOINT  # From Key Vault
AZURE_OPENAI_API_KEY   # From Key Vault
AZURE_OPENAI_MODEL     # Default: gpt-4.1
AZURE_OPENAI_MAX_TOKENS # Default: 800

Usage Examples

Basic Chat

python chat.py

Chat with Custom Model

export AZURE_OPENAI_MODEL=gpt-35-turbo
python chat.py

Chat with Streaming

python chat.py --stream

Example Conversation

You: Explain Microsoft Foundry Models Service in 3 sentences.
Assistant: Microsoft Foundry Models Service is Microsoft Azure's cloud platform offering 
that provides access to OpenAI's powerful language models. It enables developers 
to integrate capabilities like gpt-4.1 into their applications with enterprise-grade 
security and compliance. The service includes features for content filtering, 
abuse monitoring, and responsible AI practices.

[Tokens used: 89 | Estimated cost: \$0.0027]

You: What models are available?
Assistant: Microsoft Foundry Models Service offers several model families including gpt-4.1 
(most capable), GPT-3.5-Turbo (faster and cost-effective), and Embeddings models 
for vector search. Each model has different capabilities, pricing, and token limits.

[Tokens used: 67 | Estimated cost: \$0.0020]

Total session: 156 tokens | \$0.0047

Cost Management

Token Pricing (gpt-4.1)

ModelInput (per 1K tokens)Output (per 1K tokens)
gpt-4.1$0.03$0.06
GPT-3.5-Turbo$0.0015$0.002

Estimated Monthly Costs

Based on usage patterns:

Usage LevelMessages/DayTokens/DayMonthly Cost
Light20 messages3,000 tokens$3-5
Moderate100 messages15,000 tokens$15-25
Heavy500 messages75,000 tokens$75-125

Base Infrastructure Cost: $1-2/month (Key Vault + minimal compute)

Cost Optimization Tips

# 1. Use GPT-3.5-Turbo for simpler tasks (20x cheaper)
export AZURE_OPENAI_MODEL=gpt-35-turbo

# 2. Reduce max tokens for shorter responses
export AZURE_OPENAI_MAX_TOKENS=400

# 3. Monitor token usage
python chat.py --show-tokens

# 4. Set up budget alerts
az consumption budget create \
  --budget-name "openai-budget" \
  --amount 50 \
  --time-grain Monthly

Monitoring

View Token Usage

# In Azure Portal:
# OpenAI Resource → Metrics → Select "Token Transaction"

# Or via Azure CLI:
az monitor metrics list \
  --resource $(azd env get-value AZURE_OPENAI_RESOURCE_ID) \
  --metric "TokenTransaction" \
  --start-time $(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%S') \
  --interval PT1M

View API Logs

# Stream diagnostic logs
az monitor diagnostic-settings create \
  --resource $(azd env get-value AZURE_OPENAI_RESOURCE_ID) \
  --name openai-logs \
  --logs '[{"category": "Audit", "enabled": true}]' \
  --workspace $(azd env get-value LOG_ANALYTICS_WORKSPACE_ID)

# Query logs
az monitor log-analytics query \
  --workspace $(azd env get-value LOG_ANALYTICS_WORKSPACE_ID) \
  --analytics-query "AzureDiagnostics | where Category == 'Audit' | top 10 by TimeGenerated"

Troubleshooting

Issue: "Access Denied" Error

Symptoms: 403 Forbidden when calling API

Solutions:

# 1. Verify OpenAI access is approved
az cognitiveservices account show \
  --name $(azd env get-value AZURE_OPENAI_NAME) \
  --resource-group $(azd env get-value AZURE_RESOURCE_GROUP)

# 2. Check API key is correct
azd env get-value AZURE_OPENAI_API_KEY

# 3. Verify endpoint URL format
azd env get-value AZURE_OPENAI_ENDPOINT
# Should be: https://[name].openai.azure.com/

Issue: "Rate Limit Exceeded"

Symptoms: 429 Too Many Requests

Solutions:

# 1. Check current quota
az cognitiveservices account deployment show \
  --name $(azd env get-value AZURE_OPENAI_NAME) \
  --resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
  --deployment-name gpt-4.1

# 2. Request quota increase (if needed)
# Go to Azure Portal → OpenAI Resource → Quotas → Request Increase

# 3. Implement retry logic (already in chat.py)
# The application automatically retries with exponential backoff

Issue: "Model Not Found"

Symptoms: 404 error for deployment

Solutions:

# 1. List available deployments
az cognitiveservices account deployment list \
  --name $(azd env get-value AZURE_OPENAI_NAME) \
  --resource-group $(azd env get-value AZURE_RESOURCE_GROUP)

# 2. Verify model name in environment
echo $AZURE_OPENAI_MODEL

# 3. Update to correct deployment name
export AZURE_OPENAI_MODEL=gpt-4.1  # or gpt-35-turbo

Issue: High Latency

Symptoms: Slow response times (>5 seconds)

Solutions:

# 1. Check regional latency
# Deploy to region closest to users

# 2. Reduce max_tokens for faster responses
export AZURE_OPENAI_MAX_TOKENS=400

# 3. Use streaming for better UX
python chat.py --stream

Security Best Practices

1. Protect API Keys

# Never commit keys to source control
# Use Key Vault (already configured)

# Rotate keys regularly
az cognitiveservices account keys regenerate \
  --name $(azd env get-value AZURE_OPENAI_NAME) \
  --resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
  --key-name key1

2. Implement Content Filtering

# Microsoft Foundry Models includes built-in content filtering
# Configure in Azure Portal:
# OpenAI Resource → Content Filters → Create Custom Filter

# Categories: Hate, Sexual, Violence, Self-harm
# Levels: Low, Medium, High filtering

3. Use Managed Identity (Production)

# For production deployments, use managed identity
# instead of API keys (requires app hosting on Azure)

# Update infra/openai.bicep to include:
# identity: { type: 'SystemAssigned' }

Development

Run Locally

# Install dependencies
pip install -r src/requirements.txt

# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://[name].openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_MODEL="gpt-4.1"

# Run application
python src/chat.py

Run Tests

# Install test dependencies
pip install pytest pytest-cov

# Run tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=html

Update Model Deployment

# Deploy different model version
az cognitiveservices account deployment create \
  --name $(azd env get-value AZURE_OPENAI_NAME) \
  --resource-group $(azd env get-value AZURE_RESOURCE_GROUP) \
  --deployment-name gpt-35-turbo \
  --model-name gpt-35-turbo \
  --model-version "0613" \
  --model-format OpenAI \
  --sku-capacity 20 \
  --sku-name "Standard"

Clean Up

# Delete all Azure resources
azd down --force --purge

# This removes:
# - OpenAI Service
# - Key Vault (with 90-day soft delete)
# - Resource Group
# - All deployments and configurations

Next Steps

Expand This Example

  1. Add Web Interface - Build React/Vue frontend

    # Add frontend service to azure.yaml
    # Deploy to Azure Static Web Apps
    
  2. Implement RAG - Add document search with Azure AI Search

    # Integrate Azure Cognitive Search
    # Upload documents and create vector index
    
  3. Add Function Calling - Enable tool use

    # Define functions in chat.py
    # Let gpt-4.1 call external APIs
    
  4. Multi-Model Support - Deploy multiple models

    # Add gpt-35-turbo, embeddings models
    # Implement model routing logic
    

Learning Resources

Additional Resources

Documentation

Tutorials

Tools

Community


🎉 Success! You've deployed Microsoft Foundry Models and built a working chat application. Start exploring gpt-4.1's capabilities and experiment with different prompts and use cases.

Questions? Open an issue or check the FAQ

Cost Alert: Remember to run azd down when done testing to avoid ongoing charges (~$50-100/month for active usage).