Samples: APIM Costing & Showback
May 1, 2026 ยท View on GitHub
This sample demonstrates how to track and allocate API costs using Azure API Management with Azure Monitor, Application Insights, Log Analytics, and Cost Management. It supports three complementary approaches: subscription-based tracking (using APIM subscription keys), Entra ID application tracking (using the emit-metric policy with JWT appid claims), and AI Gateway token/PTU tracking (using the ApiManagementGatewayLlmLog diagnostic to capture per-request token consumption when APIM acts as an AI Gateway, joined with ApiManagementGatewayLogs on CorrelationId for business unit attribution). All approaches share a single Azure Monitor Workbook with tabbed views.
โ๏ธ Supported infrastructures: All infrastructures (or bring your own existing APIM deployment)
๐ Expected Run All runtime (excl. infrastructure prerequisite): ~15 minutes
๐ฏ Objectives
- Track API usage by caller - Use APIM subscription keys to identify business units, departments, or applications
- Track API usage by Entra ID application - Use the
emit-metricpolicy to extractappid/azpJWT claims and emit per-caller custom metrics - Capture request metrics - Log subscriptionId, apiName, operationName, and status codes
- Aggregate cost data - Combine API usage metrics with Azure Cost Management data
- Visualize showback data - Create Azure Monitor Workbooks with tabbed views for both approaches
- Enable cost governance - Establish patterns for consistent tagging and naming conventions
- Enable budget alerts - Create scheduled query alerts when callers exceed configurable thresholds
- Track AI token consumption per client - When APIM is used as an AI Gateway, capture prompt, completion, and total token usage per calling application, enabling per-client cost attribution for PTU or pay-as-you-go OpenAI deployments
- Real AOAI interactions via Foundry (optional) - Deploy a full Microsoft Foundry environment (Hub + Project + Azure AI Services) and route real Azure OpenAI traffic through APIM across both the Chat Completions and Responses APIs, demonstrating accurate token tracking for non-streaming, streaming (SSE), and stateless (
store: false) requests
Note on non-OpenAI models: This sample deploys an Azure OpenAI model only (default:
gpt-5-mini). Other model families on Azure AI Services - such as Anthropic Claude via the Azure Marketplace - are gated by separate quota that is granted through a manual approval process, which puts them beyond the scope of a self-service sample. If you have approved quota for another provider, you can extend the sample by adding a second deployment inmain.bicep; the token-tracking policy and workbook queries are model-agnostic.
โ Prerequisites
Beyond the general prerequisites (Azure subscription, CLI, Python environment), this sample requires additional Azure RBAC role assignments.
Azure RBAC Permissions
The signed-in user needs the following role assignments:
| Role | Scope | Purpose |
|---|---|---|
| Contributor | Resource Group | Deploy Bicep resources (App Insights, Log Analytics, Storage, Workbook, Diagnostic Settings) |
| Cost Management Contributor | Subscription | Create Cost Management export |
| Storage Blob Data Contributor | Storage Account | Write cost export data (auto-assigned by the notebook) |
| Cognitive Services Contributor | Resource Group | Deploy Azure AI Services when enable_foundry = True (not needed for mock path) |
For Workbook Consumers
Users who only need to view the deployed Azure Monitor Workbook (not deploy the sample) need:
| Role | Scope | Purpose |
|---|---|---|
| Monitoring Reader | Resource Group | Open and view the workbook |
| Log Analytics Reader | Log Analytics Workspace | Execute the Kusto queries that power the workbook |
๐ก If a user can open the workbook but sees empty visualizations, they are likely missing Log Analytics Reader on the workspace.
๐ Scenario
Organizations often need to allocate the cost of shared API Management infrastructure to different consumers (business units, departments, applications, or customers). This sample addresses:
- Cost Transparency: Understanding which teams or applications drive API consumption
- Chargeback/Showback: Producing data that can inform internal billing or cost awareness
- Resource Optimization: Identifying high-cost consumers and opportunities for optimization
- Budget Planning: Historical usage patterns to forecast future costs
Key Principle: Cost Determination, Not Billing
This sample focuses on producing cost data, not implementing billing processes. You determine costs; how you use that information (showback reports, chargeback, budgeting) is a separate business decision.
Three Tracking Approaches
| Aspect | Subscription-Based | Entra ID Application | AI Gateway Token/PTU |
|---|---|---|---|
| Caller identification | APIM subscription key (ApimSubscriptionId) | JWT appid/azp claim | JWT appid/azp claim |
| Data source | ApiManagementGatewayLogs in Log Analytics | customMetrics in Application Insights | ApiManagementGatewayLlmLog in Log Analytics |
| Tracking mechanism | Built-in APIM logging | emit-metric policy | APIM diagnostic setting (zero-buffering) |
| Metric name | N/A (built-in logs) | caller-requests | N/A (per-request diagnostic log) |
| Cost Management export | Yes (storage account) | No (metrics-based) | No (metrics-based) |
| Best for | Dedicated subscriptions per BU | OAuth client-credentials flows, shared subscriptions | AI Gateway scenarios (Azure OpenAI, PTU capacity planning) |
All three approaches are deployed together. Toggle enable_entraid_tracking and enable_token_tracking in the notebook to include or exclude each flow. Setting enable_foundry = True adds a real Azure OpenAI backend so token tracking uses actual model responses instead of mock data.
Streaming Support
When enable_foundry = True, the notebook demonstrates both non-streaming and streaming (SSE) chat completions. For streaming, half the requests explicitly send stream_options.include_usage = true and half intentionally omit it so the pf-ensure-stream-include-usage.xml policy fragment can prove when APIM had to inject the flag (when force_stream_include_usage is enabled). Token counts are captured by the APIM ApiManagementGatewayLlmLog diagnostic setting with zero response buffering, and proof of the policy mutation is recorded in ApiManagementGatewayLogs.TraceRecords.
- Non-streaming: The gateway logs exact token counts from the JSON response
- Streaming (SSE): The gateway reads token counts from the final SSE chunk (requires
stream_options.include_usage = true; the sample proves when APIM had to add it)
The workbook surfaces both streaming variants side-by-side so you can see exactly how each request acquired the usage object:
- Streaming (client-supplied usage) โ the client already set
stream_options.include_usage = true; APIM forwards the request unchanged. - Streaming (policy-injected usage) โ the client omitted the flag; the APIM policy fragment injected it and emitted a trace into
ApiManagementGatewayLogs.TraceRecords(look forIncludeUsageInjected).
The AI Gateway tab's Streaming vs Non-Streaming Breakdown and the Per-Request Detail tab's AI Delivery Mode + Usage Provenance columns both render this distinction, so you can confirm token capture works regardless of whether the client or APIM supplied the usage option.
AI Surface Coverage (Chat Completions + Responses API)
The notebook exercises six AI request modes per business unit per model so you can see APIM token tracking work across both Azure OpenAI surfaces and every streaming variant. Mode is chosen by j % 6 for the j-th request within a business unit, giving a deterministic, even mix:
| Mode | API surface | Streaming | Notes |
|---|---|---|---|
| 0 | Chat Completions | No | Baseline non-streaming chat. |
| 1 | Chat Completions | Yes | Client sends stream_options.include_usage = true; APIM forwards unchanged. |
| 2 | Chat Completions | Yes | Client omits stream_options; the pf-ensure-stream-include-usage.xml fragment injects it and emits an IncludeUsageInjected trace. |
| 3 | Responses API | No | Stateful (store defaults to true); uses input + max_output_tokens. |
| 4 | Responses API | Yes | Streaming Responses; the policy fragment is a no-op for this surface. |
| 5 | Responses API | No | Stateless variant with store: false to demonstrate ephemeral usage. |
The Chat Completions and Responses APIs use different api-versions (2024-10-21 vs 2025-03-01-preview), different routes (/deployments/{id}/chat/completions vs /responses), and different request shapes (messages + max_completion_tokens vs input + max_output_tokens). They share the same aoai-backend and the same APIM AI logger, so ApiManagementGatewayLlmLog rows from both surfaces flow into the same workspace and are split by OperationId (chat-completions-create vs responses-create) in the workbook.
The pf-ensure-stream-include-usage.xml fragment short-circuits for the Responses API: it only inspects the body when messages is present, so Responses requests pass through untouched. The workbook's Streaming vs Non-Streaming Breakdown, Token Counts by Business Unit & Delivery Mode table, and Per-Request Detail tab all surface an API Surface column / slice (Chat vs Responses) so you can verify each mode produced its expected rows.
Business unit attribution: Join
ApiManagementGatewayLlmLogwithApiManagementGatewayLogsonCorrelationIdto map token counts toApimSubscriptionId(business unit). Seebu-token-usage.kqlfor a ready-to-use query.
Context Propagation
The token tracking policy forwards two headers to the backend:
| Header | Value | Purpose |
|---|---|---|
x-business-unit | Extracted callerId from JWT appid | Correlate backend logs with APIM caller metrics |
x-ms-client-request-id | context.RequestId | End-to-end correlation ID across APIM and backend logs |
๐ฉ๏ธ Lab Components
This lab deploys and configures:
- Application Insights - Receives APIM diagnostic logs for request tracking
- Log Analytics Workspace - Stores
ApiManagementGatewayLogswith detailed request metadata (resource-specific mode) - Storage Account - Receives Azure Cost Management exports
- Cost Management Export - Automated export of cost data (configurable frequency)
- Diagnostic Settings - Links APIM to Log Analytics with
logAnalyticsDestinationType: Dedicatedfor resource-specific tables - Sample API & Subscriptions - 4 subscriptions representing different business units
- Entra ID Tracking API (optional) - A second API with the
emit-metricpolicy that extractsappidfrom JWT tokens and emitscaller-requestscustom metrics - AI Gateway Token Tracking API (optional) - A third API with inbound caller identity propagation and
stream_options.include_usageenforcement; token counts are captured by theApiManagementGatewayLlmLogdiagnostic setting and correlated to business units viaCorrelationIdjoin withApiManagementGatewayLogs - AOAI Gateway API (optional, requires
enable_foundry) - A fourth API that routes real Azure OpenAI chat completions through APIM using a managed-identity-authenticated backend, enabling accurate token tracking against a live model deployment - Microsoft Foundry (optional) - When
enable_foundry = True, deploys an Azure AI Foundry Hub, Project, Azure AI Services account with agpt-5-minimodel deployment, and an APIM backend with managed identity authentication (Cognitive Services OpenAI Userrole) - Azure Monitor Workbook - Pre-built tabbed dashboard with:
- Subscription-Based Costing tab: Cost allocation table (base + variable cost per BU), base vs variable cost stacked bar chart, cost breakdown by API, request count and distribution charts, success/error rate analysis, response code distribution, business unit drill-down
- Entra ID Application Costing tab: Usage by caller ID (bar chart + table), cost allocation by caller (table + pie chart), hourly request trend by caller
- AI Gateway Token/PTU tab: Summary tiles grouped under APIM Inbound (AI Requests across all subs, AI Requests per BU) and AI Backend (a Successful row with
Successful (all 2xx),Successful (2xx, with tokens),Successful (no tokens), and an Errors row withThrottled (429),Client Errors (4xx),Server Errors (5xx)), then a Tokens row (total tokens), followed by a request-funnel table, a Token Coverage Investigation drill-in forSuccessful (no tokens), scope-reconciliation explainer + table, token cost allocation table with configurable per-1K-token rates, model and streaming pie charts, streaming vs non-streaming breakdown table, token-share pie, and hourly token-type trend chart
- SKU-Based Pricing - Automatically derives base monthly cost, overage rate, and included request allowance from the deployed APIM SKU using built-in pricing data (sourced from the Azure API Management pricing page, March 2026)
- Budget Alerts (optional) - Per-BU scheduled query alerts when request thresholds are exceeded
Workbook Query Optimization
Azure Monitor Workbook query items execute independently โ there is no native mechanism to share a materialized table across query items. The workbook applies two patterns to minimise data scanned:
| Pattern | Where applied | Effect |
|---|---|---|
materialize() for multi-reference let bindings | Subscription-Based and Entra ID tabs (any query that derives both a toscalar(count) total and a per-BU summarize from the same base set) | Log Analytics computes the base set once per query execution instead of scanning the underlying table twice |
| Column-project before joins | AI Gateway tab (all ApiManagementGatewayLlmLog โ ApiManagementGatewayLogs joins) | Each query projects only the columns it needs from both sides of the join, reducing the join's memory and network footprint |
Why not a single base query for the AI Gateway tab? Workbooks cannot share a materialized table across query items. Merge items can combine two already-computed result sets but cannot perform arbitrary re-aggregation. Each AI Gateway visual therefore runs its own join, but column-projecting both sides keeps each join as lean as possible.
Cost Allocation Model
| Component | Formula |
|---|---|
| Base Cost Share | Base Monthly Cost x (BU Requests / Total Requests) |
| Variable Cost | BU Requests x (Rate per 1K / 1000) |
| Total Allocated | Base Cost Share + Variable Cost |
What Gets Logged
| Field | Description |
|---|---|
ApimSubscriptionId | Identifies the caller (BU / department / app) |
ApiId | Which API was called |
OperationId | Specific operation within the API |
ResponseCode | Success / failure indication |
| Request count | Number of requests (primary cost metric) |
Important: The API must have
subscriptionRequired: trueforApimSubscriptionIdto be populated in logs. This sample configures it automatically.
โ๏ธ Configuration
Quick Setup Checklist
Follow these steps to prepare and run the costing sample:
-
Choose an infrastructure
- Select one from the Infrastructure Architectures (or use an existing APIM deployment)
- If your chosen infrastructure does not yet exist, navigate to its folder under infrastructure and follow its README to deploy it first
-
Configure user parameters (in the notebook's first code cell, under
USER CONFIGURATION)- Deployment: Match
deployment,rg_location, andindexto your chosen infrastructure - Features to deploy: Toggle
enable_entraid_tracking,enable_token_tracking, andenable_foundryto control which cost-tracking approaches are set up - Traffic to run: Use
run_regular_requestsandrun_ai_requeststo skip phases if iterating on workbook logic - Optional: For real Entra ID token testing, set
use_real_jwt = Trueand populate JWT credentials (see Getting Started) - Alerts: Customize
alert_threshold,alert_email, andcost_export_frequencyif desired
- Deployment: Match
-
Run all cells (
Run Allin Jupyter)- Deployment takes ~3โ5 minutes (longer if
enable_foundry = True) - Traffic generation takes ~2โ3 minutes
- At the end, the notebook prints Azure portal links โ click the workbook link to view your cost dashboard
- Deployment takes ~3โ5 minutes (longer if
What Each Configuration Toggle Does
| Toggle | Purpose | Impact if disabled |
|---|---|---|
enable_entraid_tracking | Deploy Entra ID JWT tracking API | No caller-requests metrics in Entra ID workbook tab |
enable_token_tracking | Deploy AI Gateway token tracking API | No per-caller token/PTU data in AI Gateway workbook tab |
enable_foundry | Deploy real Azure OpenAI via Foundry | D1 skipped; D2 uses mock instead (adds ~5 min if enabled) |
run_regular_requests | Generate BU + Entra ID traffic | Workbook Subscription and Entra ID tabs show no data |
run_ai_requests | Generate AI traffic (real or mock) | Workbook AI Gateway tab shows no data |
create_budget_alerts | Deploy per-BU request thresholds | No budget alerts (Cell B4 creates zero alerts) |
๐ผ๏ธ Expected Results
After running the notebook, you will have:
- Application Insights showing real-time API requests and
caller-requestscustom metrics (Entra ID) - Log Analytics with queryable
ApiManagementGatewayLogs(resource-specific table) - Storage Account receiving cost export data
- Azure Monitor Workbook with tabbed views for both subscription-based and Entra ID cost allocation
- Portal links printed in the notebook's final cell for quick access
Cost Management Export
The cost export is configured automatically using a system-assigned managed identity with Storage Blob Data Contributor access.


Azure Monitor Workbook Dashboard
The deployed workbook provides a comprehensive view of API cost allocation and usage analytics across business units.






Entra ID Application Costing Tab
The Entra ID tab shows cost attribution by calling application, using the emit-metric policy's caller-requests custom metric.



AI Gateway Token/PTU Tab
The AI Gateway tab shows per-client token consumption and estimated costs when APIM is used as an AI Gateway in front of Azure OpenAI or other LLM backends. It uses the ApiManagementGatewayLlmLog diagnostic data (PromptTokens, CompletionTokens, TotalTokens, ModelName) joined with ApiManagementGatewayLogs via CorrelationId for ApimSubscriptionId-based business unit attribution.





Per-Request Detail Tab
The Per-Request Detail tab provides a row-level drill-in across every AI request, joining gateway logs with LLM diagnostic data so you can inspect a single call end to end. The AI Delivery Mode and Usage Provenance columns make it easy to confirm whether a streaming request supplied its own usage chunk or relied on the APIM policy fragment to inject one.

Streaming vs Non-Streaming Verification
When enable_foundry = True, the multi-caller traffic phase alternates between non-streaming and streaming chat completions for every business unit. The AI Gateway tab includes a Streaming vs Non-Streaming Breakdown group with:
- A pie chart showing overall request distribution across delivery modes
- A color-coded table showing per-BU request counts and prompt, completion, and total token counts split by delivery mode
This makes it easy to confirm that token tracking works identically for both modes. The streaming visuals also distinguish between client-supplied usage (the caller already set stream_options.include_usage = true) and APIM-injected usage (the policy fragment added the flag and logged proof into TraceRecords), so you can verify policy behavior end to end. The same split is available per-request on the Per-Request Detail tab via the AI Delivery Mode and Usage Provenance columns.
๐งน Clean Up
To remove all resources created by this sample, open and run clean-up.ipynb. This deletes:
- Sample API and subscriptions from APIM
- Application Insights, Log Analytics, Storage Account
- Azure Monitor Workbook
- Cost Management export
- Microsoft Foundry Hub, Project, Azure AI Services (when
enable_foundry = True)
The clean-up notebook does not delete your APIM instance or resource group.
๐ Additional Resources
- Azure API Management Pricing
- Azure Retail Prices API
- Azure Cost Management Documentation
- Log Analytics Kusto Query Language
- Azure Monitor Workbooks
- APIM Diagnostic Settings
- APIM emit-metric policy
- Application Insights custom metrics
- Microsoft Entra ID application model
- Azure OpenAI usage and token metrics
- PTU provisioned throughput concepts
- Azure OpenAI streaming with usage
- APIM azure-openai-emit-token-metric policy
- Azure AI Foundry documentation
- Tracking every token (Tech Community blog)