NIST AI Risk Management Framework (AI RMF 1.0)

May 26, 2026 · View on GitHub

Disclaimer: This document is an internal self-assessment mapping, NOT a validated certification or third-party audit. It documents how the toolkit's capabilities align with the referenced standard. Organizations must perform their own compliance assessments with qualified auditors.

Agent Governance Toolkit (AGT) Document Version: 1.0 Date: 2026-07-14 Classification: Public Framework Reference: NIST AI 100-1 — Artificial Intelligence Risk Management Framework


Table of Contents

  1. Executive Summary
  2. Methodology
  3. GOVERN — Policies, Processes, and Procedures
  4. MAP — Context and Risk Identification
  5. MEASURE — Assessment, Analysis, and Tracking
  6. MANAGE — Risk Response and Monitoring
  7. Coverage Summary Matrix
  8. Gap Analysis and Recommended Actions
  9. Cross-References to Other Compliance Frameworks

1. Executive Summary

The Agent Governance Toolkit (AGT) is an open-source, multi-language governance framework for AI agent systems. This document provides a systematic alignment assessment of AGT against all 19 subcategories of the NIST AI Risk Management Framework (AI RMF 1.0), covering the four core functions: GOVERN, MAP, MEASURE, and MANAGE.

Scorecard

MetricValue
Total subcategories assessed19
Fully Addressed12 (63%)
Partially Addressed7 (37%)
Gaps (Not Addressed)0 (0%)
Strongest areasGOVERN 1 (Policy), MANAGE 1 (Risk Response), MANAGE 4 (Monitoring)
Areas for improvementMAP 5 (Individual Impacts), MEASURE 4 (Measurement Feedback), MANAGE 2 (Benefit Maximization)

AGT demonstrates strong-to-excellent coverage across all four RMF functions. The toolkit's strongest capabilities lie in policy infrastructure (10+ PolicyEngine implementations across Python, .NET, and TypeScript), risk response mechanisms (circuit breakers, kill switches, saga compensation), and deep observability (OpenTelemetry, fleet monitoring, rogue agent detection). The primary improvement opportunities are in bias/fairness evaluation, compliance trend analysis, and formal benefit-maximization framing.


2. Methodology

This assessment maps AGT capabilities to each of the 19 NIST AI RMF subcategories using the following evidence types:

  • Code artifacts — Source files, classes, functions, and configuration schemas
  • Documentation — Architecture docs, threat models, and compliance mappings
  • Benchmarks — Performance measurements quantifying governance overhead
  • Templates — Policy-as-code YAML templates for common regulatory patterns

Coverage levels are assigned as:

LevelCriteria
Fully AddressedSubcategory requirements are met by production-ready code with tests and documentation
⚠️ Partially AddressedCore capabilities exist but with documented gaps or limitations
GapNo code or documentation addresses this subcategory

3. GOVERN — Policies, Processes, and Procedures

GOVERN 1: Policies Reflecting Risk Management Are in Place

Coverage: ✅ FULLY ADDRESSED

AGT implements a multi-layered, declarative policy system with schema validation, versioning, conflict resolution, and multiple backend support.

ComponentFileKey Class/Function
Core policy evaluatoragent-governance-python/agent-os/src/agent_os/policies/evaluator.pyPolicyEvaluator
Async policy evaluatoragent-governance-python/agent-os/src/agent_os/policies/async_evaluator.pyAsyncPolicyEvaluator
Shared/cross-project policiesagent-governance-python/agent-os/src/agent_os/policies/shared.pySharedPolicyEvaluator
AgentMesh policy engineagent-governance-python/agent-mesh/src/agentmesh/governance/policy.py:317PolicyEngine
AgentMesh policy evaluatoragent-governance-python/agent-mesh/src/agentmesh/governance/policy_evaluator.py:33PolicyEvaluator
.NET policy engineagent-governance-dotnet/src/AgentGovernance/Policy/PolicyEngine.cs:16PolicyEngine
TypeScript MCP policy engineagent-governance-python/agent-os/extensions/mcp-server/src/services/policy-engine.ts:208PolicyEngine
VS Code policy engineagent-governance-typescript/agent-os-vscode/src/policyEngine.ts:51PolicyEngine
Contextual policy engineagent-governance-python/agent-os/src/agent_os/execution_context_policy.py:62ContextualPolicyEngine
Semantic policy engineagent-governance-python/agent-os/src/agent_os/semantic_policy.py:248SemanticPolicyEngine
IATP policy engineagent-governance-python/agent-os/modules/iatp/iatp/policy_engine.py:78IATPPolicyEngine
Control-plane policy engineagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/policy_engine.py:178PolicyEngine
Conflict resolutionagent-governance-python/agent-os/src/agent_os/policies/conflict_resolution.pyResolutionResult
Policy schema (JSON)agent-governance-python/agent-os/src/agent_os/policies/policy_schema.jsonJSON Schema
OPA integrationagent-governance-python/agent-mesh/src/agentmesh/governance/opa.pyOPA/Rego backend
Cedar integrationagent-governance-python/agent-mesh/src/agentmesh/governance/cedar.pyCedar backend
Policy templatesagent-governance-python/agent-os/templates/policies/*.yamlGDPR, production, enterprise, data-protection, content-safety

How AGT addresses this subcategory: Policy-as-code with YAML templates supports declarative governance across environments. Multiple backend engines (native, OPA Rego, Cedar) allow organizations to use existing policy infrastructure. Schema validation, versioning (PolicyVersion), diff tracking, and conflict detection provide lifecycle management. Three enforcement modes (strict, permissive, audit) enable progressive policy rollout.

Gaps: None identified.


GOVERN 2: Accountability Structures Are in Place

Coverage: ✅ FULLY ADDRESSED

AGT provides cryptographic audit trails, Merkle hash chains, Shapley-value fault attribution, and joint liability tracking.

ComponentFileKey Class/Function
Merkle audit chainagent-governance-python/agent-mesh/src/agentmesh/governance/audit.py:153MerkleAuditChain
Flight recorder (control-plane)agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/flight_recorder.py:33FlightRecorder
Flight recorder (IATP)agent-governance-python/agent-os/modules/iatp/iatp/telemetry/__init__.py:21FlightRecorder
Flight recorder (Lightning)agent-governance-python/agent-lightning/src/agent_lightning_gov/emitter.py:56FlightRecorderEmitter
Hypervisor auditagent-governance-python/agent-hypervisor/audit/delta.pyDeltaEngine
Shapley attributionagent-governance-python/agent-hypervisor/src/hypervisor/liability/attribution.pyShapley-value fault attribution
Joint liabilityagent-governance-python/agent-hypervisor/src/hypervisor/liability/__init__.pyJoint liability module
Liability ledgeragent-governance-python/agent-hypervisor/src/hypervisor/liability/ledger.pyLiability tracking
Quarantine systemagent-governance-python/agent-hypervisor/src/hypervisor/liability/quarantine.pyAgent quarantine
RBACagent-governance-python/agent-os/src/agent_os/integrations/rbac.py4 roles: READER, WRITER, ADMIN, AUDITOR
DID-based attributionagent-governance-python/agent-mesh/src/agentmesh/governance/audit.pyagent_did field per entry

How AGT addresses this subcategory: Merkle hash chains provide tamper-evident audit trails where each entry is cryptographically linked to its predecessor. Shapley-value attribution enables mathematical fault attribution across multi-agent systems — a capability rare in governance toolkits. RBAC with four predefined roles (READER, WRITER, ADMIN, AUDITOR) enforces least-privilege access. DID-based agent identity ensures every action is traceable to a specific agent.

Gaps: None identified.


GOVERN 3: Workforce Diversity and Expertise

Coverage: ⚠️ PARTIALLY ADDRESSED

AGT has community governance documentation but no code-level enforcement of diversity, expertise requirements, or contributor roles.

ComponentFileNotes
Contributing guideCONTRIBUTING.mdContribution process, DCO, PR workflow
Code of conductCODE_OF_CONDUCT.mdMicrosoft Open Source Code of Conduct
Community guideCOMMUNITY.mdCommunity structure, communication channels
Security policySECURITY.mdVulnerability reporting process

How AGT addresses this subcategory: Community documentation establishes contribution norms, inclusive conduct standards, and security reporting processes. The Microsoft Open Source Code of Conduct provides an organizational commitment to diversity and inclusion.

Gaps: No machine-readable role definitions, no expertise verification mechanisms, no diversity tracking. This is primarily an organizational obligation typically outside the scope of a governance toolkit.


GOVERN 4: Organizational Practices with Third-Party Entities

Coverage: ✅ FULLY ADDRESSED

AGT implements comprehensive supply chain security including plugin signing, trust tiers, MCP gateway controls, AI-BOM, and dependency confusion protection.

ComponentFileKey Class/Function
MCP security scanneragent-governance-python/agent-os/src/agent_os/mcp_security.py:324MCPSecurityScanner
MCP gatewayagent-governance-python/agent-os/src/agent_os/mcp_gateway.py:99MCPGateway
Plugin signingagent-governance-python/agent-marketplace/src/agent_marketplace/signing.py:22PluginSigner (Ed25519)
Plugin manifestagent-governance-python/agent-marketplace/src/agent_marketplace/manifest.py:36PluginManifest
MCP trust proxyagent-governance-python/agent-mesh/packages/mcp-proxy/TypeScript proxy with policy enforcement
Trust tiersagent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.pyfilter_capabilities()
Usage trust scoringagent-governance-python/agent-marketplace/src/agent_marketplace/usage_trust.py:48UsageTrustScorer
Marketplace policyagent-governance-python/agent-marketplace/src/agent_marketplace/marketplace_policy.pyMCPServerPolicy
Egress policyagent-governance-python/agent-os/src/agent_os/egress_policy.py:50EgressPolicy
AI-BOMagent-governance-python/agent-mesh/docs/RFC_AGENT_SBOM.mdAI Bill of Materials v2.0
Federationagent-governance-python/agent-mesh/src/agentmesh/governance/federation.pyCross-org federation

How AGT addresses this subcategory: Ed25519-signed plugins and manifest validation ensure supply chain integrity. The five-tier trust scoring system (0–1000) with filter_capabilities() restricts third-party agents to appropriate privilege levels. MCP gateway allowlist/blocklist controls, security scanning (tool poisoning and injection detection), and egress policies manage third-party data flows. AI-BOM v2.0 provides model provenance, dataset lineage, and weights versioning.

Gaps: None identified.


GOVERN 5: Risk Management Processes Are Defined and Implemented

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
EU AI Act risk classifieragent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.pyRiskLevel, RiskClassifier, AgentRiskProfile
Compliance frameworkagent-governance-python/agent-mesh/src/agentmesh/governance/compliance.pyMulti-framework compliance
Control-plane complianceagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.pyCompliance engine
Rogue agent detectoragent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304RogueAgentDetector

How AGT addresses this subcategory: EU AI Act four-tier risk classification (UNACCEPTABLE, HIGH, LIMITED, MINIMAL) provides structured risk assessment. AgentRiskProfile aggregates risk signals per agent. The compliance engine supports multi-framework verification, allowing organizations to define and enforce risk management processes declaratively.

Gaps: None identified.


GOVERN 6: Policies and Procedures Aligned with Applicable Requirements

Coverage: ✅ FULLY ADDRESSED

AGT maintains dedicated compliance mapping documents for seven major frameworks.

FrameworkFileStatus
OWASP Agentic Top 10docs/compliance/owasp-agentic-top10-architecture.mdAll ASI risk categories mapped
EU AI Actdocs/compliance/eu-ai-act-checklist.md9/11 articles addressed
SOC 2 Type IIdocs/compliance/soc2-mapping.md4/5 criteria addressed
ATF Conformancedocs/compliance/atf-conformance-assessment.md25/25 requirements (7 partial)
OWASP LLM Top 10docs/compliance/owasp-llm-top10-mapping.mdFull mapping
NIST RFI (2026)docs/compliance/nist-rfi-2026-00206.mdQuestion-by-question mapping
South Korea AI Framework Actagent-governance-python/agent-compliance/docs/compliance/south-korea-ai-framework-act.mdMapped

How AGT addresses this subcategory: Each compliance document systematically maps AGT capabilities to specific regulatory requirements, identifies gaps, and provides code citations. This document (NIST AI RMF alignment) extends coverage to the eighth framework.

Gaps: None identified.


4. MAP — Context and Risk Identification

MAP 1: Context Is Established

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
Execution contextagent-governance-python/agent-os/src/agent_os/execution_context_policy.py:62ContextualPolicyEngine
Stateless kernel contextagent-governance-python/agent-os/src/agent_os/stateless.pyExecutionContext
Governance tiersagent-governance-python/agent-hypervisor/src/hypervisor/models.pyRing 0–3 privilege separation
Policy modesagent-governance-python/agent-os/src/agent_os/policies/schema.py:34-41strict, permissive, audit
Context budgetagent-governance-python/agent-os/src/agent_os/context_budget.pyContextScheduler

How AGT addresses this subcategory: ContextualPolicyEngine binds policy evaluation to rich execution context including governance tiers, environment type, and operational mode. The four-ring privilege model (Ring 0: kernel through Ring 3: untrusted) establishes operational boundaries for each agent. ContextScheduler manages token budgets and resource allocation within context.

Gaps: None identified.


MAP 2: Categorization of AI Systems

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
EU AI Act risk classifieragent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.pyRiskLevel enum
Agent risk profileagent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.pyAgentRiskProfile dataclass
Compliance checker exampleagent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.pyDemo risk classifier
Trust tiers (5-tier)docs/ARCHITECTURE.md0–1000 scale: Untrusted → Verified Partner
Execution rings (4-tier)agent-governance-python/agent-hypervisor/src/hypervisor/models.pyRing 0 (kernel) → Ring 3 (untrusted)

How AGT addresses this subcategory: Dual categorization systems — EU AI Act risk levels (UNACCEPTABLE, HIGH, LIMITED, MINIMAL) and the five-tier trust score (0–1000) — enable AI systems to be categorized by both regulatory risk and behavioral trust. The four-ring execution model further segments agents by privilege level.

Gaps: None identified.


MAP 3: Benefits and Costs Assessed

Coverage: ⚠️ PARTIALLY ADDRESSED

AGT provides comprehensive performance benchmarks quantifying governance overhead but lacks formal cost-benefit frameworks.

ComponentFileKey Metric
Policy benchmarksBENCHMARKS.md0.011ms p50 (single rule), 47K ops/sec at 1K agents
Kernel benchmarksagent-governance-python/agent-os/benchmarks/bench_kernel.py0.103ms p50 full enforcement path
Audit benchmarksagent-governance-python/agent-os/benchmarks/bench_audit.py2µs per audit write
Adapter overheadBENCHMARKS.md0.005–0.007ms per adapter check
Circuit breakerBENCHMARKS.md0.0005ms (1.83M ops/sec)
SRE benchmarksagent-governance-python/agent-sre/src/agent_sre/benchmarks/__init__.pySRE-specific benchmarks

How AGT addresses this subcategory: Governance overhead is rigorously quantified in latency and throughput terms. Sub-millisecond policy evaluation and microsecond-level audit writes demonstrate that governance does not impose meaningful performance penalties.

Gaps: No formal ROI model or cost-benefit analysis framework. Overhead is quantified in technical terms (latency/throughput) but not in business value terms (risk reduction, compliance cost savings, incident prevention value).


MAP 4: Risks and Impacts Identified

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Content
STRIDE threat modeldocs/security/threat-model.md4 trust boundaries, 6 attack surfaces, STRIDE analysis
OWASP Agentic Top 10docs/compliance/owasp-agentic-top10-architecture.mdAll ASI risk categories mapped with mitigations
Blast radius containmentagent-governance-python/agent-hypervisor/src/hypervisor/models.pyRing isolation, Ring 0–3
Cascade detectionagent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:223CascadeDetector
Ring breach detectionagent-governance-python/agent-hypervisor/rings/breach_detector.pySliding-window anomaly detection
Prompt injection detectoragent-governance-python/agent-os/src/agent_os/prompt_injection.py:357PromptInjectionDetector (12+ patterns)
Memory guardagent-governance-python/agent-os/src/agent_os/memory_guard.py:170MemoryGuard — memory poisoning defense
Adversarial evaluatoragent-governance-python/agent-sre/src/agent_sre/chaos/adversarial.pyAdversarial testing
Chaos testingagent-governance-python/agent-sre/src/agent_sre/chaos/engine.pyChaos engineering library

How AGT addresses this subcategory: STRIDE-based threat modeling systematically identifies risks across four trust boundaries and six attack surfaces. Prompt injection detection (12+ pattern families), memory poisoning defense, and cascade detection provide defense-in-depth. Chaos engineering and adversarial evaluation proactively discover risks before production deployment.

Gaps: None identified.


MAP 5: Impacts to Individuals, Groups, and Communities

Coverage: ⚠️ PARTIALLY ADDRESSED

AGT has PII/PHI protection via regex patterns and GDPR policy templates but lacks ML-based bias detection or fairness evaluation.

ComponentFileKey Class/Function
GDPR policy templateagent-governance-python/agent-os/templates/policies/gdpr.yaml10+ PII pattern categories, right to erasure, data minimization
Data protection templateagent-governance-python/agent-os/templates/policies/data-protection.yamlData protection rules
PII detection policyagent-governance-python/agent-os/examples/shared-policies/no-pii.yamlShareable PII blocking policy
Memory guard PII redactionagent-governance-python/agent-os/src/agent_os/memory_guard.pyPII redaction in context
Content governanceagent-governance-python/agent-os/src/agent_os/content_governance.py:78ContentQualityEvaluator
HIPAA exampleagent-governance-python/agent-os/tutorials/hipaa-compliant-agent/demo.pyHealthcare compliance demo
Healthcare HIPAA exampleagent-governance-python/agent-mesh/examples/03-healthcare-hipaa/main.pyPHI protection demo

How AGT addresses this subcategory: GDPR policy templates provide declarative PII protection across 10+ categories with right-to-erasure and data minimization controls. Memory guard actively redacts PII from agent context. HIPAA-compliant agent tutorials demonstrate PHI protection patterns.

Gaps:

  • No ML-based NER (e.g., Presidio) for PII/PHI — regex-only detection
  • No bias detection algorithms or fairness metrics
  • No demographic parity or equalized odds evaluation
  • No consent management system
  • No Data Subject Access Request (DSAR) workflow automation

5. MEASURE — Assessment, Analysis, and Tracking

MEASURE 1: Metrics Identified and Applied

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
SLO engineagent-governance-python/agent-sre/src/agent_sre/slo/objectives.py:167SLO, ErrorBudget, SLOStatus
SLO specagent-governance-python/agent-sre/src/agent_sre/slo/spec.py:51SLOSpec, ErrorBudgetPolicy
SLO dashboardagent-governance-python/agent-sre/src/agent_sre/slo/dashboard.py:73SLODashboard, SLOSnapshot
SLO validatoragent-governance-python/agent-sre/src/agent_sre/slo/validator.py:33SLODiff
.NET SLO engineagent-governance-dotnet/src/AgentGovernance/Sre/SloEngine.csErrorBudgetPolicy, ErrorBudgetTracker
SLO VS Code panelagent-governance-typescript/agent-os-vscode/src/views/sloDashboardView.ts:38SLODashboardProvider
Trust score (AgentMesh)agent-governance-python/agent-mesh/src/agentmesh/governance/0–1000 scale, 5 tiers
Shift-left metricsagent-governance-python/agent-os/src/agent_os/shift_left_metrics.pyShiftLeftTracker, ViolationStage, ViolationRecord
Usage trust scoreragent-governance-python/agent-marketplace/src/agent_marketplace/usage_trust.py:48UsageTrustScorer
OTel metricsagent-governance-python/agent-sre/src/agent_sre/integrations/otel/metrics.pyOpenTelemetry metrics export
MCP metricsagent-governance-python/agent-os/src/agent_os/_mcp_metrics.pyMCP-specific metrics
Langfuse SLO scoresagent-governance-python/agent-sre/src/agent_sre/integrations/langfuse/exporter.py:56SLOScore

How AGT addresses this subcategory: SLI/SLO/error budget engine provides structured quantitative metrics with dashboard visualization. Trust scoring (0–1000, five tiers) quantifies agent trustworthiness. Shift-left metrics track governance violations by lifecycle stage (pre-commit, PR, CI, runtime). OpenTelemetry integration exports metrics to industry-standard observability platforms.

Gaps: None identified.


MEASURE 2: AI Systems Evaluated

Coverage: ⚠️ PARTIALLY ADDRESSED

ComponentFileKey Class/Function
Content quality evaluatoragent-governance-python/agent-os/src/agent_os/content_governance.py:78ContentQualityEvaluator
Plugin quality assessoragent-governance-python/agent-marketplace/src/agent_marketplace/quality_assessment.py:120QualityAssessor
Red team datasetagent-governance-python/agent-os/modules/control-plane/benchmark/red_team_dataset.pyRed-team benchmark data
Policy benchmark suiteagent-governance-python/agent-os/benchmarks/bench_policy.py30-scenario OWASP benchmark
CMVK verificationagent-governance-python/agent-os/modules/cmvk/src/cmvk/constitutional.pyCross-Model Verification Kernel

How AGT addresses this subcategory: Content quality evaluation and plugin quality assessment provide governance-level evaluation. Red-team datasets and 30-scenario OWASP benchmarks test governance enforcement under adversarial conditions. The Cross-Model Verification Kernel (CMVK) enables constitutional AI checks across models.

Gaps: No formal model accuracy or correctness evaluation pipeline. Quality assessment focuses on governance and content safety rather than model performance metrics (e.g., accuracy, calibration, hallucination rate).


MEASURE 3: Mechanisms for Tracking Identified AI Risks

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
Behavioral baselineagent-governance-python/agent-sre/src/agent_sre/anomaly/detector.py:68BehaviorBaseline
Rogue agent detectoragent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304RogueAgentDetector
Drift detector (Agent OS)agent-governance-python/agent-os/src/agent_os/integrations/drift_detector.py:93DriftDetector, DriftType enum
MCP drift detector (SRE)agent-governance-python/agent-sre/src/agent_sre/integrations/mcp/__init__.py:169DriftDetector
Flight recorder (control-plane)agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/flight_recorder.py:33FlightRecorder
Ring breach detectionagent-governance-python/agent-hypervisor/rings/breach_detector.pySliding-window anomaly detection
Fleet monitoringagent-governance-python/agent-sre/src/agent_sre/fleet/__init__.pyFleet-wide health with AgentState.DEGRADED

How AGT addresses this subcategory: Behavioral baselines establish normal operating patterns per agent. Drift detectors identify deviations from expected behavior. The rogue agent detector classifies agents exhibiting anomalous patterns. Flight recorders provide forensic-grade telemetry for post-incident analysis. Fleet monitoring aggregates health across agent populations.

Limitation: Behavioral baselines are in-memory only — no durable cross-session persistence. Baselines are lost when agent sessions terminate.


MEASURE 4: Feedback About Efficacy of Measurement

Coverage: ⚠️ PARTIALLY ADDRESSED

ComponentFileKey Class/Function
Shift-left trackeragent-governance-python/agent-os/src/agent_os/shift_left_metrics.pyShiftLeftTracker — violations by lifecycle stage
SLO dashboardagent-governance-python/agent-sre/src/agent_sre/slo/dashboard.py:73SLODashboard snapshots
VS Code SLO panelagent-governance-typescript/agent-os-vscode/src/webviews/sidebar/panels/SLOSummary.tsxReal-time SLO summary
OTel governance exportagent-governance-python/agent-mesh/src/agentmesh/observability/otel_governance.pyGovernance telemetry
Langfuse exporteragent-governance-python/agent-sre/src/agent_sre/integrations/langfuse/exporter.pySLO scores to Langfuse
OpenLit integrationagent-governance-python/agent-sre/src/agent_sre/integrations/openlit.pyOpenLit observability

How AGT addresses this subcategory: Shift-left metrics track violations by lifecycle stage (pre-commit, PR, CI, runtime), enabling measurement of where governance catches issues. SLO dashboards provide point-in-time compliance snapshots. Integration with Langfuse and OpenLit enables external measurement platforms.

Gaps: No time-series compliance trend analysis, no measurement-of-measurement loops, no formal reports on metric effectiveness. The toolkit provides raw measurement capabilities but does not yet evaluate whether those measurements are themselves effective.


6. MANAGE — Risk Response and Monitoring

MANAGE 1: Risks Prioritized and Responded To

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
Circuit breaker (SRE)agent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:90CircuitBreaker (trip/open/half-open)
Circuit breaker (incidents)agent-governance-python/agent-sre/src/agent_sre/incidents/circuit_breaker.py:59CircuitBreaker, CircuitBreakerRegistry
Circuit breaker (Agent OS)agent-governance-python/agent-os/src/agent_os/_circuit_breaker_impl.py:82CircuitBreaker, CascadeDetector
.NET circuit breakeragent-governance-dotnet/src/AgentGovernance/Sre/CircuitBreaker.cs:62CircuitBreaker
Kill switchagent-governance-python/agent-hypervisor/src/hypervisor/security/kill_switch.py:69KillSwitch.kill() — 6 kill reasons
Rate limiter (hypervisor)agent-governance-python/agent-hypervisor/src/hypervisor/security/rate_limiter.py:86AgentRateLimiter
Rate limiter (Agent Mesh)agent-governance-python/agent-mesh/src/agentmesh/services/rate_limiter.py:93RateLimiter
Rate limiter (MCP sliding)agent-governance-python/agent-os/src/agent_os/mcp_sliding_rate_limiter.py:17MCPSlidingRateLimiter
Rate limiter (TypeScript)agent-governance-python/agent-mesh/packages/mcp-proxy/src/rate-limiter.ts:19RateLimiter
.NET rate limiteragent-governance-dotnet/src/AgentGovernance/RateLimiting/RateLimiter.cs:11RateLimiter
Approval workflowagent-governance-python/agent-os/extensions/mcp-server/src/services/approval-workflow.ts:18ApprovalWorkflow — quorum, expiration
Saga orchestratoragent-governance-python/agent-hypervisor/saga/orchestrator.pySagaOrchestrator — rollback compensation
Reversibility registryagent-governance-python/agent-hypervisor/reversibility/registry.pyUndo/rollback registry

How AGT addresses this subcategory: Multi-tier risk response: circuit breakers (with trip/open/half-open state machine) prevent cascade failures; kill switches provide immediate agent termination for six enumerated risk categories; rate limiters (sliding window, token bucket) control throughput across all language packages. Approval workflows with quorum requirements add human oversight. Saga orchestrators enable compensating transactions to roll back multi-step operations upon failure.

Gaps: None identified.


MANAGE 2: Strategies to Maximize AI Benefits

Coverage: ⚠️ PARTIALLY ADDRESSED

ComponentFileKey Class/Function
Trust scoring (0–1000)agent-governance-python/agent-mesh/src/agentmesh/governance/5 tiers: Untrusted → Verified Partner
Trust decayagent-governance-python/agent-mesh/Scores degrade without positive signals
Capability delegationagent-governance-python/agent-mesh/identity/agent_id.pydelegate(), capability narrowing
Graduated ringsagent-governance-python/agent-hypervisor/src/hypervisor/models.pyRing 0–3 privilege escalation/demotion
Ring demotionagent-governance-python/agent-hypervisor/session/__init__.pyupdate_ring()
Trust-tier filteringagent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.pyfilter_capabilities()
Progressive deliveryagent-governance-python/agent-sre/src/agent_sre/delivery/Canary deploys, GitOps
NoOp fallbacksagent-governance-python/agent-os/src/agent_os/compat.py:37NoOpPolicyEvaluator
RL training governanceagent-governance-python/agent-lightning/Policy rewards for RL training

How AGT addresses this subcategory: Trust-based capability delegation (child ≤ parent) ensures agents earn expanded privileges through demonstrated trustworthy behavior. Progressive delivery (canary deploys) minimizes risk when introducing governance changes. Trust decay ensures agents maintain good behavior to retain capabilities.

Gaps: No formal "benefit maximization" framework. Trust-based capability delegation exists but is framed as security controls rather than benefit optimization. No documented strategy for balancing governance overhead against agent utility.


MANAGE 3: Risks from Third-Party Entities Managed

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
MCP security scanneragent-governance-python/agent-os/src/agent_os/mcp_security.py:324MCPSecurityScanner — tool poisoning, injection detection
MCP gatewayagent-governance-python/agent-os/src/agent_os/mcp_gateway.py:99MCPGateway — allowlist/blocklist
MCP trust proxyagent-governance-python/agent-mesh/packages/mcp-proxy/TypeScript proxy with policy enforcement
Plugin signingagent-governance-python/agent-marketplace/src/agent_marketplace/signing.py:22PluginSigner — Ed25519
Plugin manifest validationagent-governance-python/agent-marketplace/src/agent_marketplace/manifest.py:36PluginManifest — Pydantic validation
Marketplace policyagent-governance-python/agent-marketplace/src/agent_marketplace/marketplace_policy.pyMCPServerPolicy, org-level policies
Trust tiersagent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.pyPlugin trust tier filtering
AI-BOM v2.0agent-governance-python/agent-mesh/docs/RFC_AGENT_SBOM.mdModel provenance, dataset lineage
Egress policyagent-governance-python/agent-os/src/agent_os/egress_policy.py:50EgressPolicy — domain allow/deny
Schema adaptersagent-governance-python/agent-marketplace/src/agent_marketplace/schema_adapters.pyCopilot/Claude manifest normalization

How AGT addresses this subcategory: Defense-in-depth for third-party risks: MCP security scanner detects tool poisoning and injection; gateway enforces allowlist/blocklist policies; plugin signing (Ed25519) and manifest validation prevent supply chain attacks. AI-BOM v2.0 tracks model provenance and dataset lineage. Egress policies control outbound data flows to authorized domains only.

Gaps: None identified.


MANAGE 4: Risks Monitored

Coverage: ✅ FULLY ADDRESSED

ComponentFileKey Class/Function
Rogue agent detectoragent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304RogueAgentDetector — scoring, classification
Fleet monitoringagent-governance-python/agent-sre/src/agent_sre/fleet/__init__.pyFleet-wide health, AgentState enum
OTel tracing (SRE)agent-governance-python/agent-sre/src/agent_sre/tracing/spans.pyDistributed tracing spans
OTel metrics (SRE)agent-governance-python/agent-sre/src/agent_sre/tracing/metrics.pyMetrics instrumentation
OTel exportersagent-governance-python/agent-sre/src/agent_sre/tracing/exporters.pyOTLP/Jaeger/Zipkin exporters
OTel governance SDKagent-governance-python/agent-mesh/src/agentmesh/observability/otel_sdk.pyGovernance-aware OTel
OTel governance enrichmentagent-governance-python/agent-mesh/src/agentmesh/observability/otel_governance.pyPolicy events as OTel spans
OTel saga sinkagent-governance-python/agent-sre/src/agent_sre/integrations/otel/saga_sink.pySaga lifecycle as OTel spans
OTel eventsagent-governance-python/agent-sre/src/agent_sre/integrations/otel/events.pyGovernance event export
OpenLit integrationagent-governance-python/agent-sre/src/agent_sre/integrations/openlit.pyOpenLit observability
Agent OS observabilityagent-governance-python/agent-os/modules/observability/src/agent_os_observability/tracer.pyAgent OS tracing
Hypervisor event busagent-governance-python/agent-hypervisor/src/hypervisor/observability/event_bus.pyInternal event bus
Cascade detectoragent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:223CascadeDetector

How AGT addresses this subcategory: Deep observability stack: OpenTelemetry integration across all packages (spans, metrics, events) exports to OTLP/Jaeger/Zipkin. Rogue agent detector uses behavioral scoring to classify anomalous agents. Fleet monitoring provides population-level health dashboards. Governance-enriched OTel spans embed policy evaluation results directly into distributed traces, enabling governance-aware debugging.

Gaps: None identified.


7. Coverage Summary Matrix

#SubcategoryCoverageEvidence StrengthKey Artifacts
1GOVERN 1 — Policies✅ FullStrong10+ PolicyEngine implementations, OPA/Cedar backends
2GOVERN 2 — Accountability✅ FullStrongMerkle audit, Shapley attribution, RBAC, DID
3GOVERN 3 — Workforce⚠️ PartialModerateCONTRIBUTING.md, CODE_OF_CONDUCT.md
4GOVERN 4 — Third-party practices✅ FullStrongPlugin signing, MCP scanner, AI-BOM, egress policy
5GOVERN 5 — Risk processes✅ FullStrongEU AI Act classifier, compliance engine
6GOVERN 6 — Requirements alignment✅ FullStrong7 framework compliance mappings
7MAP 1 — Context✅ FullStrongExecutionContext, 4-ring model, 3 policy modes
8MAP 2 — Categorization✅ FullStrongRiskLevel enum, AgentRiskProfile, 5-tier trust
9MAP 3 — Benefits/costs⚠️ PartialModerateLatency/throughput benchmarks; no ROI model
10MAP 4 — Risks identified✅ FullStrongSTRIDE threat model, OWASP 10/10, chaos testing
11MAP 5 — Individual impacts⚠️ PartialModerateGDPR template, PII regex; no bias/fairness
12MEASURE 1 — Metrics✅ FullStrongSLO engine, trust scoring, shift-left, OTel
13MEASURE 2 — Evaluation⚠️ PartialModerateContent quality, red team; no model eval pipeline
14MEASURE 3 — Risk tracking✅ FullStrongDrift detection, baselines, flight recorder
15MEASURE 4 — Measurement feedback⚠️ PartialModerateShift-left tracker, SLO dashboard
16MANAGE 1 — Risk response✅ FullStrongCircuit breakers, kill switch, rate limiters, sagas
17MANAGE 2 — Maximize benefits⚠️ PartialModerateTrust scoring, graduated autonomy
18MANAGE 3 — Third-party risks✅ FullStrongMCP scanner, plugin signing, trust tiers, AI-BOM
19MANAGE 4 — Monitoring✅ FullStrongOTel, rogue detector, fleet monitoring, cascade

Totals: 12 Fully Addressed · 7 Partially Addressed · 0 Gaps


Priority 1 — HIGH

GapSubcategoryCurrent StateRecommended Action
No bias/fairness evaluationMAP 5Regex-only PII detection; no algorithmic bias testingIntegrate ML-based NER (e.g., Presidio); add FairnessEvaluator with demographic parity and equalized odds metrics
No consent/DSAR managementMAP 5GDPR template has data minimization but no consent workflowImplement consent management and DSAR automation in agent-compliance

Priority 2 — MEDIUM

GapSubcategoryCurrent StateRecommended Action
No compliance trend analysisMEASURE 4Point-in-time SLO snapshots onlyAdd ComplianceTrendAnalyzer to aggregate shift-left and SLO data over time; expose via SRE dashboard API
No model evaluation pipelineMEASURE 2Content/plugin quality onlyAdd ModelEvaluator module or LM Harness/HELM integration for accuracy/calibration benchmarks
No benefit-maximization framingMANAGE 2Trust delegation framed as securityDocument governance ROI; reframe trust scoring as benefit optimization with measurable utility metrics
In-memory behavioral baselinesMEASURE 3Baselines lost on session endAdd BaselinePersistence backend (SQLite or file-backed) to agent-governance-python/agent-sre/anomaly/

Priority 3 — LOW

GapSubcategoryCurrent StateRecommended Action
No ROI/cost-benefit modelMAP 3Technical benchmarks onlyAdd "Governance ROI" analysis to BENCHMARKS.md framing overhead in business value terms
No workforce role enforcementGOVERN 3Documentation onlyConsider machine-readable contributor role definitions (organizational scope)

9. Cross-References to Other Compliance Frameworks

This alignment assessment complements and cross-references the following AGT compliance documents. Subcategory mappings below show where NIST AI RMF requirements overlap with other frameworks.

NIST AI RMF SubcategoryATF ReferenceOWASP ReferenceEU AI Act ReferenceSOC 2 Reference
GOVERN 1 (Policies)A-1, A-2 (Policy definition & enforcement)Art. 9 (Risk management system)CC6.1 (Logical access)
GOVERN 2 (Accountability)A-5 (Audit trails)Art. 12 (Record-keeping)CC4.1 (Monitoring)
GOVERN 3 (Workforce)Art. 14 (Human oversight)
GOVERN 4 (Third-party)D-1 through D-5 (Supply chain)A-05 (Insecure Plugin Design)Art. 28 (Obligations of deployers)CC9.2 (Vendor mgmt)
GOVERN 5 (Risk processes)A-3 (Risk assessment)Art. 9 (Risk management system)CC3.2 (Risk assessment)
GOVERN 6 (Requirements)All sectionsAll risksAll articlesAll criteria
MAP 1 (Context)B-1 (Execution boundaries)Art. 9.2 (Intended purpose)
MAP 2 (Categorization)A-3 (Risk classification)Art. 6 (Classification rules)
MAP 3 (Benefits/costs)Art. 9.4 (Cost proportionality)
MAP 4 (Risks identified)B-2, B-3 (Threat analysis)A-01 through A-10 (All risks)Art. 9.2 (Risk identification)CC3.2 (Risk assessment)
MAP 5 (Individual impacts)C-1, C-2 (Data protection)A-08 (Excessive Agency)Art. 10 (Data governance)P1–P8 (Privacy criteria)
MEASURE 1 (Metrics)E-1 (SLI/SLO)Art. 9.7 (Testing/metrics)CC4.1 (Monitoring)
MEASURE 2 (Evaluation)E-2 (Quality assessment)Art. 9.5 (Testing)CC7.1 (System monitoring)
MEASURE 3 (Risk tracking)B-3 (Behavioral baseline)A-03 (Excessive Agency)Art. 9.8 (Risk monitoring)CC7.2 (Change monitoring)
MEASURE 4 (Feedback)E-3 (Continuous improvement)Art. 9.9 (Documentation updates)CC4.2 (Deficiency mgmt)
MANAGE 1 (Risk response)F-1, F-2 (Circuit breakers, kill switch)A-06 (Excessive Agency)Art. 14 (Human oversight)CC7.3 (Change mgmt)
MANAGE 2 (Maximize benefits)Recital 4 (Innovation balance)
MANAGE 3 (Third-party risks)D-1 through D-5 (Supply chain)A-05 (Insecure Plugin Design)Art. 28 (Deployer obligations)CC9.2 (Vendor mgmt)
MANAGE 4 (Monitoring)E-1, F-3 (Observability)A-09 (Overreliance)Art. 72 (Post-market monitoring)CC7.1 (System monitoring)

This document was prepared for submission to the National Institute of Standards and Technology (NIST) in response to the AI Risk Management Framework (AI RMF 1.0) alignment assessment process. It reflects the state of the Agent Governance Toolkit as of 2026-07-14. For questions or clarifications, please refer to the project's SUPPORT.md or open an issue on GitHub.