Samples: APIM Costing & Showback

May 1, 2026 ยท View on GitHub

This sample demonstrates how to track and allocate API costs using Azure API Management with Azure Monitor, Application Insights, Log Analytics, and Cost Management. It supports three complementary approaches: subscription-based tracking (using APIM subscription keys), Entra ID application tracking (using the emit-metric policy with JWT appid claims), and AI Gateway token/PTU tracking (using the ApiManagementGatewayLlmLog diagnostic to capture per-request token consumption when APIM acts as an AI Gateway, joined with ApiManagementGatewayLogs on CorrelationId for business unit attribution). All approaches share a single Azure Monitor Workbook with tabbed views.

โš™๏ธ Supported infrastructures: All infrastructures (or bring your own existing APIM deployment)

๐Ÿ‘Ÿ Expected Run All runtime (excl. infrastructure prerequisite): ~15 minutes

๐ŸŽฏ Objectives

  1. Track API usage by caller - Use APIM subscription keys to identify business units, departments, or applications
  2. Track API usage by Entra ID application - Use the emit-metric policy to extract appid/azp JWT claims and emit per-caller custom metrics
  3. Capture request metrics - Log subscriptionId, apiName, operationName, and status codes
  4. Aggregate cost data - Combine API usage metrics with Azure Cost Management data
  5. Visualize showback data - Create Azure Monitor Workbooks with tabbed views for both approaches
  6. Enable cost governance - Establish patterns for consistent tagging and naming conventions
  7. Enable budget alerts - Create scheduled query alerts when callers exceed configurable thresholds
  8. Track AI token consumption per client - When APIM is used as an AI Gateway, capture prompt, completion, and total token usage per calling application, enabling per-client cost attribution for PTU or pay-as-you-go OpenAI deployments
  9. Real AOAI interactions via Foundry (optional) - Deploy a full Microsoft Foundry environment (Hub + Project + Azure AI Services) and route real Azure OpenAI traffic through APIM across both the Chat Completions and Responses APIs, demonstrating accurate token tracking for non-streaming, streaming (SSE), and stateless (store: false) requests

Note on non-OpenAI models: This sample deploys an Azure OpenAI model only (default: gpt-5-mini). Other model families on Azure AI Services - such as Anthropic Claude via the Azure Marketplace - are gated by separate quota that is granted through a manual approval process, which puts them beyond the scope of a self-service sample. If you have approved quota for another provider, you can extend the sample by adding a second deployment in main.bicep; the token-tracking policy and workbook queries are model-agnostic.

โœ… Prerequisites

Beyond the general prerequisites (Azure subscription, CLI, Python environment), this sample requires additional Azure RBAC role assignments.

Azure RBAC Permissions

The signed-in user needs the following role assignments:

RoleScopePurpose
ContributorResource GroupDeploy Bicep resources (App Insights, Log Analytics, Storage, Workbook, Diagnostic Settings)
Cost Management ContributorSubscriptionCreate Cost Management export
Storage Blob Data ContributorStorage AccountWrite cost export data (auto-assigned by the notebook)
Cognitive Services ContributorResource GroupDeploy Azure AI Services when enable_foundry = True (not needed for mock path)

For Workbook Consumers

Users who only need to view the deployed Azure Monitor Workbook (not deploy the sample) need:

RoleScopePurpose
Monitoring ReaderResource GroupOpen and view the workbook
Log Analytics ReaderLog Analytics WorkspaceExecute the Kusto queries that power the workbook

๐Ÿ’ก If a user can open the workbook but sees empty visualizations, they are likely missing Log Analytics Reader on the workspace.

๐Ÿ“ Scenario

Organizations often need to allocate the cost of shared API Management infrastructure to different consumers (business units, departments, applications, or customers). This sample addresses:

  • Cost Transparency: Understanding which teams or applications drive API consumption
  • Chargeback/Showback: Producing data that can inform internal billing or cost awareness
  • Resource Optimization: Identifying high-cost consumers and opportunities for optimization
  • Budget Planning: Historical usage patterns to forecast future costs

Key Principle: Cost Determination, Not Billing

This sample focuses on producing cost data, not implementing billing processes. You determine costs; how you use that information (showback reports, chargeback, budgeting) is a separate business decision.

Three Tracking Approaches

AspectSubscription-BasedEntra ID ApplicationAI Gateway Token/PTU
Caller identificationAPIM subscription key (ApimSubscriptionId)JWT appid/azp claimJWT appid/azp claim
Data sourceApiManagementGatewayLogs in Log AnalyticscustomMetrics in Application InsightsApiManagementGatewayLlmLog in Log Analytics
Tracking mechanismBuilt-in APIM loggingemit-metric policyAPIM diagnostic setting (zero-buffering)
Metric nameN/A (built-in logs)caller-requestsN/A (per-request diagnostic log)
Cost Management exportYes (storage account)No (metrics-based)No (metrics-based)
Best forDedicated subscriptions per BUOAuth client-credentials flows, shared subscriptionsAI Gateway scenarios (Azure OpenAI, PTU capacity planning)

All three approaches are deployed together. Toggle enable_entraid_tracking and enable_token_tracking in the notebook to include or exclude each flow. Setting enable_foundry = True adds a real Azure OpenAI backend so token tracking uses actual model responses instead of mock data.

Streaming Support

When enable_foundry = True, the notebook demonstrates both non-streaming and streaming (SSE) chat completions. For streaming, half the requests explicitly send stream_options.include_usage = true and half intentionally omit it so the pf-ensure-stream-include-usage.xml policy fragment can prove when APIM had to inject the flag (when force_stream_include_usage is enabled). Token counts are captured by the APIM ApiManagementGatewayLlmLog diagnostic setting with zero response buffering, and proof of the policy mutation is recorded in ApiManagementGatewayLogs.TraceRecords.

  • Non-streaming: The gateway logs exact token counts from the JSON response
  • Streaming (SSE): The gateway reads token counts from the final SSE chunk (requires stream_options.include_usage = true; the sample proves when APIM had to add it)

The workbook surfaces both streaming variants side-by-side so you can see exactly how each request acquired the usage object:

  • Streaming (client-supplied usage) โ€” the client already set stream_options.include_usage = true; APIM forwards the request unchanged.
  • Streaming (policy-injected usage) โ€” the client omitted the flag; the APIM policy fragment injected it and emitted a trace into ApiManagementGatewayLogs.TraceRecords (look for IncludeUsageInjected).

The AI Gateway tab's Streaming vs Non-Streaming Breakdown and the Per-Request Detail tab's AI Delivery Mode + Usage Provenance columns both render this distinction, so you can confirm token capture works regardless of whether the client or APIM supplied the usage option.

AI Surface Coverage (Chat Completions + Responses API)

The notebook exercises six AI request modes per business unit per model so you can see APIM token tracking work across both Azure OpenAI surfaces and every streaming variant. Mode is chosen by j % 6 for the j-th request within a business unit, giving a deterministic, even mix:

ModeAPI surfaceStreamingNotes
0Chat CompletionsNoBaseline non-streaming chat.
1Chat CompletionsYesClient sends stream_options.include_usage = true; APIM forwards unchanged.
2Chat CompletionsYesClient omits stream_options; the pf-ensure-stream-include-usage.xml fragment injects it and emits an IncludeUsageInjected trace.
3Responses APINoStateful (store defaults to true); uses input + max_output_tokens.
4Responses APIYesStreaming Responses; the policy fragment is a no-op for this surface.
5Responses APINoStateless variant with store: false to demonstrate ephemeral usage.

The Chat Completions and Responses APIs use different api-versions (2024-10-21 vs 2025-03-01-preview), different routes (/deployments/{id}/chat/completions vs /responses), and different request shapes (messages + max_completion_tokens vs input + max_output_tokens). They share the same aoai-backend and the same APIM AI logger, so ApiManagementGatewayLlmLog rows from both surfaces flow into the same workspace and are split by OperationId (chat-completions-create vs responses-create) in the workbook.

The pf-ensure-stream-include-usage.xml fragment short-circuits for the Responses API: it only inspects the body when messages is present, so Responses requests pass through untouched. The workbook's Streaming vs Non-Streaming Breakdown, Token Counts by Business Unit & Delivery Mode table, and Per-Request Detail tab all surface an API Surface column / slice (Chat vs Responses) so you can verify each mode produced its expected rows.

Business unit attribution: Join ApiManagementGatewayLlmLog with ApiManagementGatewayLogs on CorrelationId to map token counts to ApimSubscriptionId (business unit). See bu-token-usage.kql for a ready-to-use query.

Context Propagation

The token tracking policy forwards two headers to the backend:

HeaderValuePurpose
x-business-unitExtracted callerId from JWT appidCorrelate backend logs with APIM caller metrics
x-ms-client-request-idcontext.RequestIdEnd-to-end correlation ID across APIM and backend logs

๐Ÿ›ฉ๏ธ Lab Components

This lab deploys and configures:

  • Application Insights - Receives APIM diagnostic logs for request tracking
  • Log Analytics Workspace - Stores ApiManagementGatewayLogs with detailed request metadata (resource-specific mode)
  • Storage Account - Receives Azure Cost Management exports
  • Cost Management Export - Automated export of cost data (configurable frequency)
  • Diagnostic Settings - Links APIM to Log Analytics with logAnalyticsDestinationType: Dedicated for resource-specific tables
  • Sample API & Subscriptions - 4 subscriptions representing different business units
  • Entra ID Tracking API (optional) - A second API with the emit-metric policy that extracts appid from JWT tokens and emits caller-requests custom metrics
  • AI Gateway Token Tracking API (optional) - A third API with inbound caller identity propagation and stream_options.include_usage enforcement; token counts are captured by the ApiManagementGatewayLlmLog diagnostic setting and correlated to business units via CorrelationId join with ApiManagementGatewayLogs
  • AOAI Gateway API (optional, requires enable_foundry) - A fourth API that routes real Azure OpenAI chat completions through APIM using a managed-identity-authenticated backend, enabling accurate token tracking against a live model deployment
  • Microsoft Foundry (optional) - When enable_foundry = True, deploys an Azure AI Foundry Hub, Project, Azure AI Services account with a gpt-5-mini model deployment, and an APIM backend with managed identity authentication (Cognitive Services OpenAI User role)
  • Azure Monitor Workbook - Pre-built tabbed dashboard with:
    • Subscription-Based Costing tab: Cost allocation table (base + variable cost per BU), base vs variable cost stacked bar chart, cost breakdown by API, request count and distribution charts, success/error rate analysis, response code distribution, business unit drill-down
    • Entra ID Application Costing tab: Usage by caller ID (bar chart + table), cost allocation by caller (table + pie chart), hourly request trend by caller
    • AI Gateway Token/PTU tab: Summary tiles grouped under APIM Inbound (AI Requests across all subs, AI Requests per BU) and AI Backend (a Successful row with Successful (all 2xx), Successful (2xx, with tokens), Successful (no tokens), and an Errors row with Throttled (429), Client Errors (4xx), Server Errors (5xx)), then a Tokens row (total tokens), followed by a request-funnel table, a Token Coverage Investigation drill-in for Successful (no tokens), scope-reconciliation explainer + table, token cost allocation table with configurable per-1K-token rates, model and streaming pie charts, streaming vs non-streaming breakdown table, token-share pie, and hourly token-type trend chart
  • SKU-Based Pricing - Automatically derives base monthly cost, overage rate, and included request allowance from the deployed APIM SKU using built-in pricing data (sourced from the Azure API Management pricing page, March 2026)
  • Budget Alerts (optional) - Per-BU scheduled query alerts when request thresholds are exceeded

Workbook Query Optimization

Azure Monitor Workbook query items execute independently โ€” there is no native mechanism to share a materialized table across query items. The workbook applies two patterns to minimise data scanned:

PatternWhere appliedEffect
materialize() for multi-reference let bindingsSubscription-Based and Entra ID tabs (any query that derives both a toscalar(count) total and a per-BU summarize from the same base set)Log Analytics computes the base set once per query execution instead of scanning the underlying table twice
Column-project before joinsAI Gateway tab (all ApiManagementGatewayLlmLog โŸ• ApiManagementGatewayLogs joins)Each query projects only the columns it needs from both sides of the join, reducing the join's memory and network footprint

Why not a single base query for the AI Gateway tab? Workbooks cannot share a materialized table across query items. Merge items can combine two already-computed result sets but cannot perform arbitrary re-aggregation. Each AI Gateway visual therefore runs its own join, but column-projecting both sides keeps each join as lean as possible.

Cost Allocation Model

ComponentFormula
Base Cost ShareBase Monthly Cost x (BU Requests / Total Requests)
Variable CostBU Requests x (Rate per 1K / 1000)
Total AllocatedBase Cost Share + Variable Cost

What Gets Logged

FieldDescription
ApimSubscriptionIdIdentifies the caller (BU / department / app)
ApiIdWhich API was called
OperationIdSpecific operation within the API
ResponseCodeSuccess / failure indication
Request countNumber of requests (primary cost metric)

Important: The API must have subscriptionRequired: true for ApimSubscriptionId to be populated in logs. This sample configures it automatically.

โš™๏ธ Configuration

Quick Setup Checklist

Follow these steps to prepare and run the costing sample:

  1. Choose an infrastructure

    • Select one from the Infrastructure Architectures (or use an existing APIM deployment)
    • If your chosen infrastructure does not yet exist, navigate to its folder under infrastructure and follow its README to deploy it first
  2. Configure user parameters (in the notebook's first code cell, under USER CONFIGURATION)

    • Deployment: Match deployment, rg_location, and index to your chosen infrastructure
    • Features to deploy: Toggle enable_entraid_tracking, enable_token_tracking, and enable_foundry to control which cost-tracking approaches are set up
    • Traffic to run: Use run_regular_requests and run_ai_requests to skip phases if iterating on workbook logic
    • Optional: For real Entra ID token testing, set use_real_jwt = True and populate JWT credentials (see Getting Started)
    • Alerts: Customize alert_threshold, alert_email, and cost_export_frequency if desired
  3. Run all cells (Run All in Jupyter)

    • Deployment takes ~3โ€“5 minutes (longer if enable_foundry = True)
    • Traffic generation takes ~2โ€“3 minutes
    • At the end, the notebook prints Azure portal links โ€” click the workbook link to view your cost dashboard

What Each Configuration Toggle Does

TogglePurposeImpact if disabled
enable_entraid_trackingDeploy Entra ID JWT tracking APINo caller-requests metrics in Entra ID workbook tab
enable_token_trackingDeploy AI Gateway token tracking APINo per-caller token/PTU data in AI Gateway workbook tab
enable_foundryDeploy real Azure OpenAI via FoundryD1 skipped; D2 uses mock instead (adds ~5 min if enabled)
run_regular_requestsGenerate BU + Entra ID trafficWorkbook Subscription and Entra ID tabs show no data
run_ai_requestsGenerate AI traffic (real or mock)Workbook AI Gateway tab shows no data
create_budget_alertsDeploy per-BU request thresholdsNo budget alerts (Cell B4 creates zero alerts)

๐Ÿ–ผ๏ธ Expected Results

After running the notebook, you will have:

  1. Application Insights showing real-time API requests and caller-requests custom metrics (Entra ID)
  2. Log Analytics with queryable ApiManagementGatewayLogs (resource-specific table)
  3. Storage Account receiving cost export data
  4. Azure Monitor Workbook with tabbed views for both subscription-based and Entra ID cost allocation
  5. Portal links printed in the notebook's final cell for quick access

Cost Management Export

The cost export is configured automatically using a system-assigned managed identity with Storage Blob Data Contributor access.

Cost Report - Export Overview

Cost Report - Export Details

Azure Monitor Workbook Dashboard

The deployed workbook provides a comprehensive view of API cost allocation and usage analytics across business units.

Dashboard - Cost Allocation Overview

Dashboard - Cost Breakdown by Business Unit

Dashboard - Request Distribution

Dashboard - Usage Analytics

Dashboard - Response Code Analysis

Dashboard - Drill-Down Details

Entra ID Application Costing Tab

The Entra ID tab shows cost attribution by calling application, using the emit-metric policy's caller-requests custom metric.

Entra ID - Usage by Caller ID

Entra ID - Cost Allocation

Entra ID - Request Trend

AI Gateway Token/PTU Tab

The AI Gateway tab shows per-client token consumption and estimated costs when APIM is used as an AI Gateway in front of Azure OpenAI or other LLM backends. It uses the ApiManagementGatewayLlmLog diagnostic data (PromptTokens, CompletionTokens, TotalTokens, ModelName) joined with ApiManagementGatewayLogs via CorrelationId for ApimSubscriptionId-based business unit attribution.

AI Gateway - Token Consumption by Client

AI Gateway - Token Cost Allocation

AI Gateway - Token Trends & PTU Utilization

AI Gateway - Model & Caller Breakdown

AI Gateway - Token & PTU Summary

Per-Request Detail Tab

The Per-Request Detail tab provides a row-level drill-in across every AI request, joining gateway logs with LLM diagnostic data so you can inspect a single call end to end. The AI Delivery Mode and Usage Provenance columns make it easy to confirm whether a streaming request supplied its own usage chunk or relied on the APIM policy fragment to inject one.

Per-Request Detail

Streaming vs Non-Streaming Verification

When enable_foundry = True, the multi-caller traffic phase alternates between non-streaming and streaming chat completions for every business unit. The AI Gateway tab includes a Streaming vs Non-Streaming Breakdown group with:

  • A pie chart showing overall request distribution across delivery modes
  • A color-coded table showing per-BU request counts and prompt, completion, and total token counts split by delivery mode

This makes it easy to confirm that token tracking works identically for both modes. The streaming visuals also distinguish between client-supplied usage (the caller already set stream_options.include_usage = true) and APIM-injected usage (the policy fragment added the flag and logged proof into TraceRecords), so you can verify policy behavior end to end. The same split is available per-request on the Per-Request Detail tab via the AI Delivery Mode and Usage Provenance columns.

๐Ÿงน Clean Up

To remove all resources created by this sample, open and run clean-up.ipynb. This deletes:

  • Sample API and subscriptions from APIM
  • Application Insights, Log Analytics, Storage Account
  • Azure Monitor Workbook
  • Cost Management export
  • Microsoft Foundry Hub, Project, Azure AI Services (when enable_foundry = True)

The clean-up notebook does not delete your APIM instance or resource group.

๐Ÿ”— Additional Resources