EU AI Act (Regulation 2024/1689) -- Compliance Checklist

May 26, 2026 ยท View on GitHub

How the Agent Governance Toolkit maps to the EU AI Act

Regulation: Regulation (EU) 2024/1689 -- Harmonised Rules on Artificial Intelligence Applicability: Phased -- Art. 5 (prohibited practices) and Art. 4 (AI literacy) from 2 February 2025; GPAI obligations from 2 August 2025; high-risk system obligations from 2 August 2026 Prepared: 2026-04-03 Methodology: 4-wave multi-agent investigation -- parallel discovery, adversarial conformity testing, citation validation, and strategic review. All code citations verified against source at commit 35a7cd0.


Coverage Summary

#ArticleTitleCoverageConformity Risk
4Art. 4AI LiteracyGap (out of scope)N/A
6Art. 6High-Risk ClassificationPartialHigh
9Art. 9Risk Management SystemPartialHigh
10Art. 10Data and Data GovernanceGap (out of scope)N/A
11Art. 11Technical DocumentationPartialMedium
12Art. 12Record-Keeping and LoggingPartialMedium
13Art. 13Transparency and InformationPartialMedium
14Art. 14Human OversightPartialMedium
15Art. 15Accuracy, Robustness, CybersecurityPartialMedium
26Art. 26Deployer ObligationsPartialHigh
50Art. 50Transparency for Certain AIPartialMedium

2 of 11 articles fully out of scope. 0 fully covered. 9 partially addressed.

Important: This toolkit is a runtime governance framework for AI agents. It does not train models, manage training datasets, or provide workforce training programs. Articles 4 and 10 are organizational/ML-pipeline obligations outside the toolkit's architectural boundary. "Partial" does not mean "mostly compliant" -- every Partial-rated article would require additional work to pass a conformity assessment. Articles rated Partial with High conformity risk (Art. 6, 9, 26) are functionally non-compliant in their current state.


Is Your System High-Risk?

The EU AI Act classifies AI systems into four risk tiers. The toolkit's applicability depends on how the AI agents it governs are deployed:

Risk TierTriggerToolkit Relevance
Unacceptable (Art. 5)Social scoring, real-time biometric identification, manipulationToolkit can detect and block these via policy rules
High-Risk (Art. 6)Annex III categories: biometrics, critical infrastructure, education, employment, law enforcement, migration, justice, democratic processesFull Articles 9-15 and 26 compliance required
Limited Risk (Art. 50)AI systems interacting directly with persons, generating synthetic contentTransparency obligations apply
Minimal RiskAll other AI systemsVoluntary codes of conduct

The toolkit includes a risk classifier in agent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py that maps agent profiles to these tiers. See Article 6 details for limitations.


Article-by-Article Checklist

Article 4: AI Literacy

Providers and deployers shall ensure that their staff and other persons dealing with the operation and use of AI systems on their behalf are made AI literate. -- Art. 4(1)

Coverage: Gap (out of scope)

Article 4 is an organizational/HR obligation requiring training programs, competency assessments, and workforce readiness tracking. This is outside the scope of a runtime governance toolkit. No toolkit changes recommended.

Deployer action required: Implement AI literacy programs independently of the toolkit. Consider documenting completion in agent policy metadata (e.g., operator_certified: true).


Article 6: High-Risk Classification

An AI system shall be considered high-risk where... it falls within any of the areas referred to in Annex III. -- Art. 6(2)

Coverage: Partial | Conformity Risk: High

What exists:

ComponentLocationMechanism
Risk level enum and domain constantsagent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py:30-84RiskLevel enum (4 tiers) and UNACCEPTABLE_DOMAINS, HIGH_RISK_DOMAINS, HIGH_RISK_CAPABILITIES sets
Risk classifieragent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py:136-180RiskClassifier.classify() and .explain() methods with trigger explanations
Keyword-based classifieragent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py:252-304assess_risk_category() checks system descriptions against indicator keywords
Runtime compliance checkagent-governance-python/agentmesh-integrations/langflow-agentmesh/src/langflow_agentmesh/compliance_checker.py:103-114_HIGH_RISK_DOMAINS and _UNACCEPTABLE_KEYWORDS sets

Gaps:

  • Annex I path not implemented: Art. 6(1) requires classification when the AI system is a safety component of a product covered by Union harmonisation legislation. No classifier addresses this path.
  • Art. 6(3) exemptions absent: Narrow procedural tasks, human activity improvement, pattern detection, and preparatory tasks are exempt -- no classifier implements these exemptions.
  • Profiling override missing: Art. 6(3) states exemptions never apply when the system performs profiling (GDPR Art. 4(4)). Not checked.
  • Example-only code: The most complete classifier is in examples/, not library source code. Not importable, not tested in CI, not versioned as package API.
  • Static domain sets: Annex III is subject to amendment by delegated acts. Hardcoded Python sets cannot be updated without a code release.

Conformity assessment risk: A conformity assessor evaluates the product as delivered, not its examples. The library-level classifier uses keyword substring matching, which is insufficient for structured risk classification.

Recommendation: Promote the example classifier into library code with external configuration (YAML/JSON) for regulatory updates. Add Art. 6(3) exemption logic and profiling override check.


Article 9: Risk Management System

A risk management system shall be established, implemented, documented and maintained in relation to high-risk AI systems. -- Art. 9(1)

Coverage: Partial | Conformity Risk: High

What exists:

ComponentLocationMechanism
Rogue agent detectionagent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:276-401Composite behavioral risk scoring: frequency z-scores, entropy deviation, capability profile violations
Risk category assessmentagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py:252-304Keyword-based risk classification into EU AI Act tiers
EUAI-ART9 controlagent-governance-python/agent-mesh/src/agentmesh/governance/compliance.py:256-268Declares risk management control requirements
Policy compliance SLIagent-governance-python/agent-sre/src/agent_sre/slo/indicators.py:243-266Continuous policy adherence tracking (100% target)
Chaos testingagent-governance-python/agent-sre/src/agent_sre/chaos/engine.py:246Resilience testing framework for agent systems

Gaps:

  • No lifecycle orchestration: Art. 9(2) requires a "continuous iterative process" throughout the system lifecycle. The toolkit provides risk scoring components but no process orchestrator binding identification, estimation, evaluation, and mitigation into a documented lifecycle.
  • No misuse analysis: No structured "reasonably foreseeable misuse" analysis mechanism. assess_risk_category() uses keyword matching, not structured misuse frameworks.
  • No post-market monitoring feedback loop: Risk detection is runtime-only with no persistent feedback for post-deployment risk reassessment.
  • Keyword matching is insufficient: assess_risk_category() matches substrings against 15 hardcoded keywords. A system described as "Rate citizens based on behavior for government rewards" classifies as MINIMAL_RISK because no exact keywords appear.

Conformity assessment risk: A keyword substring match does not constitute a risk management system. The RogueAgentDetector is the strongest contributor (genuine continuous anomaly detection), but it is a behavioral monitoring tool, not a structured risk framework per ISO 31000.

Recommendation: Implement a structured risk assessment framework beyond keyword matching. Connect runtime anomaly detection to a risk register lifecycle. Add misuse scenario analysis tooling.


Article 10: Data and Data Governance

High-risk AI systems which make use of techniques involving the training of AI models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5. -- Art. 10(1)

Coverage: Gap (out of scope)

The toolkit governs agent runtime behavior (policy enforcement, trust scoring, execution isolation), not model training data pipelines. A deployer using this toolkit would need separate tooling for Article 10 compliance.

Extension point: The ToolCallInterceptor chain could host a DataGovernanceInterceptor that validates runtime input data against declared schemas/constraints before tool execution. Policy rules could express data quality requirements (e.g., requires_consent: true, pii_classification: required).

Deployer action required: Use dedicated data quality/bias detection tooling for training pipelines. Consider the interceptor extension point for runtime input data governance.


Article 11: Technical Documentation

The technical documentation of a high-risk AI system shall be drawn up before that system is placed on the market or put into service. -- Art. 11(1)

Coverage: Partial | Conformity Risk: Medium

What exists:

ComponentLocationMechanism
Compliance reportsagent-governance-python/agent-mesh/src/agentmesh/governance/compliance.py:121-168ComplianceReport model with framework, period, controls, scores, violations
Policy documentsagent-governance-python/agent-os/src/agent_os/policies/schema.py:70-115Serializable YAML/JSON PolicyDocument with version, name, rules, defaults
Compliance engineagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py:306-341Framework-scoped reports with requirement counts and pass rates

Gaps:

  • No Annex IV assembly: Art. 11 and Annex IV require comprehensive static documentation: system description, design specifications, development methodology, risk management details, applied standards, conformity declaration. The toolkit generates runtime compliance reports, not static conformity dossiers.
  • No system description generation: No mechanism to produce the intended-purpose description Art. 11(1)(a) requires.
  • No development process documentation: No enforcement or generation of design decision records.
  • No performance metrics declaration: SLIs track runtime metrics but do not produce the static documentation Art. 11 mandates.

Extension point: A TechnicalDocumentationExporter could aggregate ComplianceReport, PolicyDocument, audit logs, and SLO reports into Annex IV structure, with placeholder sections for deployer-provided content (system description, development methodology).

Recommendation: Build an Annex IV template exporter that structures existing governance artifacts into the required format.


Article 12: Record-Keeping and Logging

High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system. -- Art. 12(1)

Coverage: Partial | Conformity Risk: Medium

What exists:

ComponentLocationMechanism
Merkle audit chainagent-governance-python/agent-mesh/src/agentmesh/governance/audit.py:23-344AuditEntry with SHA-256 hash chaining, MerkleAuditChain with inclusion proofs and full chain verification
Append-only audit logagent-governance-python/agent-mesh/src/agentmesh/governance/audit.py:350-512AuditLog with agent/type indexes, time-range queries, CloudEvents v1.0 export
Signed audit entriesagent-governance-python/agent-mesh/src/agentmesh/governance/audit_backends.py:31-87AuditSink protocol, SignedAuditEntry with HMAC-SHA256 signatures, HashChainVerifier
Governance audit loggeragent-governance-python/agent-os/src/agent_os/audit_logger.py:19-136Pluggable backends (JSONL, in-memory, Python logging) capturing event type, agent ID, action, decision, reason, latency
Flight recorderagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/flight_recorder.py:33-79SQLite with WAL mode, Merkle chain tamper detection, captures prompt, action, verdict, result
Delta audit engineagent-governance-python/agent-hypervisor/src/hypervisor/audit/delta.py:59-110Append-only delta log per session with SHA-256 hashed entries

Gaps:

  • No retention enforcement: Art. 12(4) requires deployers to preserve logs for at least 6 months. The toolkit provides append-only logs but no retention enforcement, expiration management, or archival lifecycle.
  • DeltaEngine chain verification is a stub: verify_chain() at delta.py:99 always returns True with comment "Public Preview: no chain verification." The hypervisor's audit trail has zero tamper evidence.
  • FlightRecorder hash covers INSERT, not final state: Hash is computed at insert time with policy_verdict='pending', but the verdict is later updated to 'allowed'/'blocked'. Tampering of the verdict field is not detectable by integrity verification.
  • Anomaly detections not in tamper-evident chain: RogueAgentDetector stores assessments in an in-memory list, not in the integrity-protected audit chain.

Strengths: This is the toolkit's strongest area. The MerkleAuditChain and SignedAuditEntry implementations are genuine cryptographic integrity mechanisms. CloudEvents v1.0 export enables enterprise SIEM integration.

Recommendation: Fix FlightRecorder hash to cover final state. Replace DeltaEngine stub with real verification. Add mandatory retention floor of 180 days. Wire anomaly detections into the tamper-evident audit chain.


Article 13: Transparency and Provision of Information to Deployers

High-risk AI systems shall be designed and developed in such a way as to ensure that their operation is sufficiently transparent to enable deployers to interpret the system's output and use it appropriately. -- Art. 13(1)

Coverage: Partial | Conformity Risk: Medium

What exists:

ComponentLocationMechanism
EUAI-ART13 controlagent-governance-python/agent-mesh/src/agentmesh/governance/compliance.py:270-282Defines explainability, documentation, and user notification requirements
Transparency checkagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py:390-401Validates provides_transparency_info boolean in context
Decision explanationsagent-governance-python/agent-os/src/agent_os/policies/schema.py:52-58PolicyRule.message field for human-readable explanation of each governance decision
CloudEvents exportagent-governance-python/agent-mesh/src/agentmesh/governance/audit.py:90-128Serializes decisions to CloudEvents v1.0 with action, outcome, policy_decision, matched_rule
OpenTelemetry tracingagent-governance-python/agent-mesh/src/agentmesh/observability/otel_governance.py:31GovernanceTracer for governance decision instrumentation

Gaps:

  • No "instructions for use" generation: Art. 13(3) requires providing deployers with: provider identity, system capabilities/limitations, intended purpose, accuracy metrics, foreseeable misuse, human oversight measures, and log interpretation guidance. No structured mechanism produces this.
  • No AI disclosure injection: No feature inserts an AI disclosure notice to end users at interaction time.
  • Limited decision explainability: Explanations are limited to policy rule message fields and audit reason strings -- no structured explanation framework for complex multi-factor decisions.

Extension point: A TransparencyInterceptor in the CompositeInterceptor chain could inject AI disclosure metadata into tool call results. Policy rules with a transparency_level attribute could trigger different disclosure requirements by risk classification. Structured "instructions for use" could be exported from PolicyDocument + ComplianceReport data.

Recommendation: Add a TransparencyInterceptor and a transparency_required policy condition. Build an "instructions for use" template exporter alongside the Art. 11 documentation exporter.


Article 14: Human Oversight

High-risk AI systems shall be designed and developed in such a way, including with appropriate human-machine interface tools, as to allow for effective human oversight during the period in which the AI system is in use. -- Art. 14(1)

Coverage: Partial | Conformity Risk: Medium

What exists:

ComponentLocationMechanism
Escalation systemagent-governance-python/agent-os/src/agent_os/integrations/escalation.py:48-583EscalationDecision enum, ApprovalBackend ABC, InMemoryApprovalQueue, WebhookApprovalBackend, EscalationHandler with timeout, quorum, fatigue detection
Kill switchagent-governance-python/agent-hypervisor/src/hypervisor/security/kill_switch.py:64-136KillSwitch with KillReason enum (BEHAVIORAL_DRIFT, RATE_LIMIT, RING_BREACH, MANUAL, QUARANTINE_TIMEOUT, SESSION_TIMEOUT)
Ring breach detectionagent-governance-python/agent-hypervisor/src/hypervisor/rings/breach_detector.py:1-60Internal circuit breaker tripping on HIGH/CRITICAL privilege breaches
Base agent escalationagent-governance-python/agent-os/src/agent_os/base_agent.py:51-81PolicyDecision.ESCALATE and EscalationRequest with approve/reject

Strengths:

  • Timeout defaults to DENY (EscalationHandler at line 308) -- the safe default for conformity
  • M-of-N quorum approval (QuorumConfig) exceeds minimum Art. 14 requirements
  • Fatigue detection prevents approval-fatigue attacks (line 324-340)

Gaps:

  • Kill switch has placeholder handoff logic: kill() at line 86 constructs and returns structured KillResult objects, but handoff/recovery is not implemented (handoff_success_count hardcoded to 0, all in-flight steps auto-marked COMPENSATED without actual compensation). A conformity assessor asking for an emergency shutdown demonstration would find the method returns data but does not terminate agent processes.
  • No decision reversal: Escalation gates pre-execution approval only. Art. 14(4)(d) requires the ability to "override or reverse a decision" -- reversal of already-executed actions is not implemented.
  • No capability/limitation disclosure: Art. 14(4)(a) requires humans to "understand the relevant capacities and limitations." No capability discovery interface or limitation disclosure mechanism exists.
  • No automation bias awareness: Art. 14(4)(b) requires awareness of "the possible tendency of automatically relying on or over-relying on output." No bias warning, confidence calibration, or disagreement indicator is surfaced.
  • InMemoryApprovalQueue is testing-only: Single-process, non-persistent. Not suitable for production human oversight.

Recommendation: Implement actual process termination in the KillSwitch. Add a decision reversal/compensation mechanism. Surface capability limitations and automation bias warnings in the escalation interface. Provide a production-grade persistent approval backend.


Article 15: Accuracy, Robustness and Cybersecurity

High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness and cybersecurity, and that they perform consistently in those respects throughout their lifecycle. -- Art. 15(1)

Coverage: Partial | Conformity Risk: Medium

What exists:

Accuracy:

ComponentLocationMechanism
Tool call accuracy SLIagent-governance-python/agent-sre/src/agent_sre/slo/indicators.py:159-182Measures correct tool selection fraction (default target 99.9%)
Task success rate SLIagent-governance-python/agent-sre/src/agent_sre/slo/indicators.py:133-156Tracks task completion success (default target 99.5%)
Hallucination rate SLIagent-governance-python/agent-sre/src/agent_sre/slo/indicators.py:297-337Measures factual accuracy via LLM-as-judge (default target 5%)
Calibration delta SLIagent-governance-python/agent-sre/src/agent_sre/slo/indicators.py:340-468Tracks predicted confidence vs. actual success rate drift

Robustness:

ComponentLocationMechanism
Chaos testingagent-governance-python/agent-sre/src/agent_sre/chaos/engine.py:246ChaosExperiment for resilience testing
Circuit breakersagent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:90Fault isolation for cascading failures
Replay engineagent-governance-python/agent-sre/src/agent_sre/replay/engine.py:105Debugging and failure reproduction
Anomaly detectionagent-governance-python/agent-sre/src/agent_sre/anomaly/detector.py:123Rolling baselines with z-score detection
Execution ringsagent-governance-python/agent-hypervisor/src/hypervisor/models.py:46-694-tier privilege isolation by trust score

Cybersecurity:

ComponentLocationMechanism
Ed25519 trust handshakeagent-governance-python/agent-mesh/src/agentmesh/trust/handshake.py:158-456Challenge/response authentication with DoS protection and caching
SPIFFE certificate authorityagent-governance-python/agent-mesh/src/agentmesh/core/identity/ca.py:6-44Ed25519 sponsor verification for SVID certificates
MCP security threat modelagent-governance-python/agent-os/src/agent_os/mcp_security.py:1-78MCPThreatType and MCPSeverity enums defining the threat taxonomy
MCP security scanneragent-governance-python/agent-os/src/agent_os/mcp_security.py:272+MCPSecurityScanner class detecting tool poisoning, rug pulls, description injection, schema abuse, cross-server attacks, confused deputy
Signed audit entriesagent-governance-python/agent-mesh/src/agentmesh/governance/audit_backends.py:61-87HMAC-SHA256 signatures on audit entries
Ring breach detectionagent-governance-python/agent-hypervisor/src/hypervisor/rings/breach_detector.py:1-60Privilege escalation detection with severity scoring
Input validationagent-governance-python/agent-hypervisor/src/hypervisor/models.py:106-220Validation on agent_did, API paths, numeric bounds

Gaps:

  • No accuracy level declaration: Art. 15(1) requires declaring accuracy metrics in instructions for use. SLIs measure runtime accuracy but no mechanism declares expected accuracy as part of the system specification.
  • Chaos engine is a framework, not a test runner: inject_fault() records that a fault was injected but does not modify system behavior. Callers must implement actual fault injection externally.
  • HMAC uses symmetric keys: Insiders with the HMAC key can forge audit entries. No external commitment (e.g., Merkle root anchoring to a timestamping service) prevents full chain rewrite.
  • MCP scanner acknowledges incompleteness: Line 287 warns it "uses built-in sample rules that may not cover all MCP tool poisoning techniques."
  • No network-level security: TLS enforcement and certificate pinning are deferred to deployment.

Strengths: Cybersecurity primitives are the most robust area. Ed25519 identity, HMAC audit integrity, MCP security scanning, and ring-based isolation provide genuine defense-in-depth.

Recommendation: Document recommended accuracy thresholds per risk category. Implement pluggable fault injection hooks in the chaos engine. Consider asymmetric signing for audit entries. Complete MCP security rules for production use.


Article 26: Deployer Obligations

Deployers of high-risk AI systems shall... keep the logs referred to in Article 12(1)... for a period appropriate to the intended purpose of the high-risk AI system, of at least six months. -- Art. 26(6)

Coverage: Partial | Conformity Risk: High

What exists:

ComponentLocationMechanism
Retention days schemaagent-governance-python/agent-os/src/agent_os/policies/policy_schema.json:215-218retention_days field with default 90, minimum 1
Human oversightagent-governance-python/agent-os/src/agent_os/integrations/escalation.py:120-583Full escalation system with approval backends
Kill switchagent-governance-python/agent-hypervisor/src/hypervisor/security/kill_switch.py:64-136Emergency termination (see Art. 14 caveats)
SRE monitoringagent-governance-python/agent-sre/src/agent_sre/slo/indicators.pySLI/SLO framework for operational monitoring
Incident detectionagent-governance-python/agent-sre/src/agent_sre/incidents/detector.pySignal and IncidentSeverity for risk signal generation

Gaps:

  • Retention minimum violates Art. 26(6): Schema default is 90 days with minimum: 1. Article 26(6) requires at least 6 months (~180 days). A deployer can set retention_days: 1 without validation error. This is a must-fix.
  • No retention enforcement at runtime: Even if retention_days is set, no code actually preserves or deletes logs based on this value. The field is a schema declaration only.
  • No instructions-for-use tracking: No mechanism for deployers to load, parse, or validate provider instructions (Art. 26(1)).
  • No worker notification: Art. 26(7) requires informing workers and representatives when AI is used in employment contexts. No feature exists.
  • No affected-individual notification: Art. 26(8) requires informing persons subject to AI decisions. No disclosure feature exists.
  • No authority cooperation workflow: Art. 26(11) requires cooperation with national competent authorities. No data packaging for authority requests.
  • No input data validation: Art. 26(4) requires deployers to ensure input data relevance and representativeness. No data quality tooling exists.
  • No competency tracking for oversight persons: Art. 26(2) requires human oversight by persons with "necessary competence, training and authority." No authorization tracking.

Recommendation: Change retention_days default to 180 and minimum to 180 for high-risk systems. Implement actual log retention enforcement. Add provider_instructions metadata field. Build an AuthorityExporter for regulatory inquiries.


Article 50: Transparency Obligations for Certain AI Systems

Providers shall ensure that AI systems intended to interact directly with natural persons are designed and developed in such a way that the natural person concerned is informed that they are interacting with an AI system, unless this is obvious. -- Art. 50(1), paraphrased

Coverage: Partial | Conformity Risk: Medium

What exists:

ComponentLocationMechanism
Transparency checkeragent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py:186-231Validates transparency_disclosure on AgentProfile
Transparency requirementagent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py:390-401Checks provides_transparency_info boolean in context
Risk indicatorsagent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py:80-84LIMITED_RISK_INDICATORS includes deepfake_generation

Gaps by sub-obligation:

Sub-obligationStatusScope
Art. 50(1): AI interaction disclosureGap (extension point)In-scope -- toolkit should enforce disclosure policy
Art. 50(2): Synthetic content marking (C2PA/watermarking)Gap (out of scope)Content-pipeline concern, not agent governance
Art. 50(3): Emotion recognition notificationGap (extension point)Enforceable via policy conditions
Art. 50(4): Deepfake disclosure labelingGap (out of scope)Content-pipeline concern
  • No runtime disclosure mechanism: Transparency checks validate configuration flags but do not deliver actual notices to end users. The toolkit checks whether you said you would disclose, not whether you actually do disclose.
  • Compliance checker is example code: TransparencyChecker lives in examples/, not library source.

Extension point: An ai_disclosure_required policy condition could block tool execution if disclosure hasn't been confirmed (via context flag). The CompositeInterceptor chain is the natural enforcement point.

Recommendation: Promote TransparencyChecker to library code. Add ai_disclosure_required and emotion_recognition_notice_required policy conditions that enforce disclosure before permitting operations.


Coverage Matrix

ArticleObligationStatusEvidenceConformity
Art. 4AI literacy for staffGapN/AOut of scope
Art. 6(1)Annex I safety component classificationGapN/ANot implemented
Art. 6(2)Annex III area classificationPartialcompliance_checker.py:30-84, compliance.py:252-304Example-only; keyword matching
Art. 6(3)Exemptions and profiling overrideGapN/ANot implemented
Art. 9(1)Continuous risk management lifecyclePartialrogue_detector.py:276-401, compliance.py:252-304Detection exists; lifecycle orchestration absent
Art. 10Data governance (training data)GapN/AOut of scope
Art. 11Technical documentation (Annex IV)Partialcompliance.py:121-168, schema.py:70-115Runtime reports; no conformity dossier
Art. 12(1)Automatic event loggingPartialaudit.py:23-512, audit_logger.py:19-136, flight_recorder.py:33-79Multiple layers, but 3 of 4 have integrity defects
Art. 12(4)6-month log retentionGappolicy_schema.json:215-218 (default 90, min 1)Violates minimum
Art. 13(1)Output interpretabilityPartialaudit.py:90-128 (CloudEvents), schema.py:52-58 (rule messages)Basic; no structured explainability
Art. 13(3)Instructions for useGapN/ANot implemented
Art. 14(1)Effective human oversightPartialescalation.py:48-583Escalation system with quorum and fatigue detection
Art. 14(4)(d)Decline/override/reversePartialescalation.py:120-213 (approve/deny)Pre-execution only; no reversal
Art. 14(4)(e)Stop mechanismPartialkill_switch.py:64-136Returns structured results; placeholder handoff, no process termination
Art. 15(1)Accuracy levelsPartialindicators.py:159-468SLIs exist; no formal declaration mechanism
Art. 15(3)RobustnessPartialengine.py:246, circuit_breaker.py:90Framework exists; no actual fault injection
Art. 15(4)CybersecurityPartialhandshake.py:158-456, mcp_security.py:272+, audit_backends.py:61-87Ed25519, HMAC-SHA256 (symmetric key risk), MCP scanning (incomplete rules)
Art. 26(2)Human oversight by competent personsPartialescalation.py:120-583, kill_switch.py:64-136Mechanisms exist; no competency tracking
Art. 26(6)6-month log retentionGappolicy_schema.json:218 (minimum: 1)Must-fix: default 90, minimum 1
Art. 50(1)AI interaction disclosureGapcompliance_checker.py:186-231 (example)Config check only; no runtime delivery
Art. 50(2)Synthetic content markingGapN/AOut of scope

Recommendations

Must-Fix (Conformity Blockers)

  1. Log retention minimum (Art. 12, 26): Change retention_days default to 180 and minimum to 180 in policy_schema.json. Implement actual retention enforcement at runtime. The current minimum: 1 directly contradicts Art. 26(6).

High Priority (Would Fail Conformity)

  1. Risk classification (Art. 6, 9): Promote the example classifier to library code. Replace keyword substring matching with structured risk assessment. Add Art. 6(3) exemptions and profiling override.

  2. Kill switch implementation (Art. 14): Implement actual process termination and handoff recovery. The current KillSwitch returns structured results but has placeholder handoff logic (hardcoded zero successes).

  3. Audit chain integrity (Art. 12, 15, 26): Fix both DeltaEngine (verify_chain() stub returning True) and FlightRecorder (hash covers INSERT-time state, not final verdict). These are the same class of defect affecting three articles simultaneously.

Medium Priority (Extension Points)

  1. Technical documentation exporter (Art. 11): Build an Annex IV template exporter aggregating existing governance artifacts.

  2. Transparency interceptor (Art. 13, 50): Add TransparencyInterceptor to enforce disclosure policies at the interceptor chain level.

  3. Accuracy declaration (Art. 15): Document recommended accuracy thresholds per risk category. Add a formal accuracy declaration mechanism alongside SLIs.

Low Priority (Deployer Responsibility)

  1. AI literacy (Art. 4): Organizational obligation. Consider adding an operator_certified metadata field.

  2. Data governance (Art. 10): Training data is out of scope. Consider a DataGovernanceInterceptor for runtime input validation.

  3. Worker notification (Art. 26): Employment-context notification is a deployer obligation outside the toolkit's scope.


Article 11: Documentation Templates

The following template maps the Annex IV technical documentation structure to toolkit-generated artifacts. Deployers should fill sections marked [DEPLOYER] with their own content. Note: The majority of Annex IV documentation requires manual authoring. The toolkit can auto-generate governance artifacts (policies, audit trails, SLO reports) but not system descriptions, design specifications, or development methodology records.

Annex IV Section 1: General Description

FieldSourceNotes
System name and version[DEPLOYER]
Intended purpose[DEPLOYER]
Provider identity[DEPLOYER]
Risk classificationComplianceEngine.assess_risk_category()Requires promotion from example code
Applicable regulationsComplianceEngine.generate_report()Lists applicable frameworks

Annex IV Section 2: Design and Development

FieldSourceNotes
Development methodology[DEPLOYER]
Design specificationsPolicyDocument (YAML/JSON export)Governance rules and constraints
System architecture[DEPLOYER]
Applied standards[DEPLOYER]

Annex IV Section 3: Monitoring and Functioning

FieldSourceNotes
Governance policiesPolicyDocument.to_yaml() / PolicyDocument.to_json()Exportable from schema.py
Audit trailAuditLog CloudEvents exportaudit.py:90-128
SLO complianceSLI framework reportsindicators.py
Incident historySignal and IncidentSeverity logsdetector.py

Annex IV Section 4: Risk Management

FieldSourceNotes
Risk register[DEPLOYER]Toolkit provides assess_risk_category() but not a register
Anomaly detectionsRogueAgentDetector assessmentsrogue_detector.py:276-401
Mitigation measuresPolicy rules + escalation configurationExportable

Annex IV Section 5: Accuracy and Robustness

FieldSourceNotes
Accuracy metricsToolCallAccuracy, TaskSuccessRate, HallucinationRate SLIsRuntime metrics, not static declarations
Robustness testingChaosExperiment resultsFramework only; deployer must implement injection
Cybersecurity measuresEd25519 identity, HMAC audit, MCP scanningExportable configurations

Scope Limitations

This checklist covers Articles 4, 6, 9-15, 26, and 50 based on primary-source research. The following articles are not covered but may be relevant to deployers:

ArticleTitleWhy It Matters
Art. 17Quality Management SystemProvider obligation: documented QMS covering risk management, post-market monitoring, resource management, and supplier controls
Art. 25Responsibilities Along the AI Value ChainDefines provider/deployer boundary and responsibility allocation -- critical for a toolkit used by downstream deployers
Art. 27Fundamental Rights Impact AssessmentDeployers of high-risk systems must perform FRIA before deployment
Art. 43Conformity Assessment ProceduresDefines assessment procedures for high-risk systems -- notified body involvement requirements
Art. 49EU Database RegistrationHigh-risk systems must be registered before market placement
Art. 62Serious Incident ReportingHard legal obligation: 15-day reporting deadline to market surveillance authorities for serious incidents
Art. 72Post-Market MonitoringProviders must establish post-market monitoring systems proportionate to the risk

Additionally, this checklist does not address GDPR interplay. Art. 26(9) requires deployers to use the system to conduct Data Protection Impact Assessments under GDPR Art. 35. A DPO reviewing this checklist should cross-reference against DPIA obligations.

Cross-Article Dependencies

Fixing certain gaps yields improvements across multiple articles simultaneously:

FixArticles ImprovedLeverage
Retention enforcement (minimum 180 days + runtime)Art. 12(4), Art. 26(6)Highest -- single fix resolves two regulatory contradictions
Promote example classifier to library codeArt. 6, Art. 9, Art. 50Risk tier drives classification, management, and transparency triggers
Instructions-for-use exporterArt. 11, Art. 13(3)Both require structured system description artifacts
KillSwitch actual terminationArt. 14(4)(e), Art. 26(2)Stop mechanism and deployer oversight both depend on it
Audit chain integrity (DeltaEngine, FlightRecorder hash)Art. 12, Art. 15(4), Art. 26(6)Tamper evidence underpins logging, cybersecurity, and retention

Defense-in-Depth Warnings

Several "Partial" ratings rely on a single mechanism with no fallback:

  • Art. 14 (Stop mechanism): Entire emergency shutdown capability rests on KillSwitch, which has placeholder handoff logic. No secondary kill path exists.
  • Art. 12 (Audit integrity): Three of four audit implementations have integrity defects (DeltaEngine stub, FlightRecorder hash gap, anomaly detections outside chain). Only MerkleAuditChain in agent-mesh is fully sound.
  • Art. 15 (Audit signing): HMAC-SHA256 uses symmetric keys. Any insider with the key can forge the entire chain. No external anchoring or asymmetric signing as a second layer.
  • Art. 9 (Risk classification): Single keyword substring match with no structured fallback. One evasive system description bypasses it entirely.

Cross-References


Sources


Maintenance: This checklist should be reviewed when: (a) the toolkit releases a new version, (b) the EU Commission adopts delegated acts amending Annex III risk categories, or (c) implementing acts on conformity procedures are published. The Annex III domain sets in the risk classifier are hardcoded and cannot track regulatory amendments without a code release.

Disclaimer: This checklist is an automated mapping of toolkit capabilities against EU AI Act requirements. It is not legal advice and does not constitute a conformity assessment. Partial coverage does not equal partial compliance -- a conformity assessor evaluates pass/fail per obligation, not percentage coverage. Organizations should engage qualified legal counsel and notified bodies for formal compliance evaluation.