SAF-T1912: Stego Response Exfiltration
June 1, 2026 · View on GitHub
Overview
Tactic: Exfiltration (ATK-TA0010) Technique ID: SAF-T1912 Severity: High
Description
Stego Response Exfiltration is a technique where attackers embed covert payloads inside AI-generated or MCP-generated responses. These payloads are commonly hidden in code blocks, JSON structures, logs, or markdown elements that users frequently copy or paste into external systems. The embedded content may consist of zero-width Unicode characters, encoded payloads, or structured steganographic data that allows attackers to exfiltrate information without detection.
From a technical perspective, the attack exploits the tendency of AI-generated structured outputs to appear trustworthy. MCP systems often generate code, configuration, or schema-based responses; attackers manipulate these formats to insert high‑entropy data or invisible characters that remain undetected by basic sanitation. When users transfer these outputs into IDEs, build systems, cloud consoles, or other environments, the payload propagates and is eventually decoded or executed by downstream processes.
Attack Vectors
Primary Vector: Payload hidden inside LLM/MCP response code blocks or structured outputs.
Secondary Vectors:
-
Stealth encoding via zero-width characters.
-
Zero-width character encoding (e.g., U+200B, U+200C).
-
Embedding malicious instructions inside comments, JSON fields, or whitespace.
-
Hidden payloads inside comments, markdown, or unused configuration parameters.
Technical Details
Prerequisites
-
Attacker can influence or manipulate a model response.
-
User copies or transfers content from the MCP response into another environment.
Attack Flow
flowchart TD A[Start: Attacker Identifies Output Channel] --> B[Prepare Hidden Payload Whitespace Unicode Invisible Characters] B --> C[Prime Model to Produce Predictable Structure Example Code Blocks] C --> D[Inject Hidden Data into Model Response] D --> E[Model Outputs Response with Embedded Payload] E --> F[User Copies AI Output Code Block or Structured Text] F --> G[Hidden Data Travels via User Action Paste Commit Upload] G --> H[Payload Reaches Attacker Controlled System] H --> I[Attacker Extracts Embedded Data] I --> J[Stealth Persistence or Retriggering] classDef phase fill:#1f2937,stroke:#ffffff,color:#ffffff,border-radius:6px; class A,B,C,D,E,F,G,H,I,J phase;
Initial Stage – Prompt Influence
Attacker forces the model to output structured text with embedded hidden data. Attacker manipulates the AI system to output predictable structures (code/JSON).
Covert Embedding
Hidden payload encoded inside code blocks, JSON, markdown, comments, or zero-width characters.
Payload Preparation
Covert data encoded using zero-width Unicode, base64, compressed blobs, or steganographic patterns.
User Interaction
The unsuspecting user copies the output into an external system (IDE, API, CI/CD pipeline).
Embedding Stage
Hidden content inserted into comments, whitespace, unused config keys, or token patterns.
Exploitation Stage
The downstream system decodes, interprets, or executes the hidden payload. User copies the response into another system where the payload is parsed, executed, or decoded.
Post-Exfiltration
Payload exfiltrates sensitive data or establishes covert channels.
Post-Exploitation
Payload exfiltrates confidential data back to an attacker-controllaed channel or establishes persistence.
Example Scenario
{
"config": {
"theme": "light",
"user_settings": "ZW1iZGRlZF9leGZpbF9kYXRh"
}
}
Advanced Attack Techniques (2023–2024 Research)
According to research from LLM-Steganography (Zou et al., 2023) and StegLLM (Liu et al., 2024):
- Zero-Width Character Encoding — Payload embedded inside invisible Unicode (Zou, 2023).
Zero-Width Character Encoding
Payload encoded using invisible Unicode sequences, enabling stealthy transport (Zou et al., 2023).
- Model-Generated Steganographic Text — Entire sentences produced by LLMs encode data through token choice patterns (Liu, 2024).
Token-Pattern Steganography
Natural-language responses where sentence structure or token selection encodes binary payloads (Liu et al., 2024).
Impact Assessment
-
Confidentiality: High — Hidden payload can extract secrets undetected. Confidentiality: High – Stealthy extraction of sensitive data through invisible or encoded payloads.
-
Integrity: Medium — Payload may alter downstream configurations. Integrity: Medium – Payload may alter configuration files or execution flows.
-
Availability: Low — Rare impact on system availability. Availability: Low – Technique does not target system disruption.
-
Scope: Network-wide — Payload may propagate across systems via copy-paste. Scope: Network-wide – Payload spreads via copy/paste into tools, repositories, or CI/CD pipelines.
Current Status (2025)
Security researchers warn that most LLM platforms lack: Security researchers report that most AI systems do not sanitize zero-width Unicode or detect steganographic encoding in structured responses. Organizations have begun introducing:
-
Zero-width Unicode filtering
-
Steganography detection in structured outputs.
-
Entropy-based scanning of AI-generated outputs
-
Strict schema validation to block unexpected fields
Detection Methods
(Sources: OWASP Top 10 for LLM Applications, MCP Security Notes)
Indicators of Compromise (IoCs)
-
Suspicious base64 or hex blobs in harmless configuration blocks.
-
Presence of zero-width Unicode characters in configuration files.
-
Presence of zero-width Unicode characters (\u200b, \u200c, \u2060).
-
Unexpected base64, hex, or compressed blobs in simple AI-generated outputs.
-
Unexpected large code blocks unrelated to query intent.
-
AI responses with abnormally large or nested code blocks unrelated to query intent.
Detection Rules
Important: This Sigma rule is example only and must be adapted.
See detection-rule.yml in this directory for the example detection rule used with this technique.
Important: The following Sigma rule is an example only and not comprehensive.
title: Stego Payload Embedded in MCP Responses id: 7b29c1c4-9730-4e13-9ef0-52e48cbb2c61 status: experimental description: Detects suspicious zero-width characters or encoded blobs in MCP responses. author: rajivsthh date: 2025/11/26 references:
- https://github.com/saf-mcp/techniques/SAF-T1912
logsource:
product: mcp
service: response
detection:
selection:
response.text|re:
- '\u200b'
- '\u200c'
- '^[A-Za-z0-9+/]{20,}={0,2}$' condition: selection falsepositives:
- Legitimate encoded configuration values
- Tools that embed base64 for expected features level: high tags:
- attack.exfiltration
- attack.t1020
- safe.t1912
Behavioral Indicators
-
Users copying unusually large code blocks from model responses.
-
Users copy unusually large portions of AI output into external systems.
-
Frequent appearance of encoded or compressed payloads in simple tasks.
-
Presence of encoded or invisible content in repository commits.
Mitigation Strategies
Preventive Controls
-
SAF-M-3: Output Sanitization — Strip zero-width characters and detect suspicious encoding patterns.
-
SAF-M-14: Model Response Validation — Enforce schema validation to detect hidden or unexpected fields.
-
SAF-M-21: Content Security Filtering — Reject high-entropy payloads inside text responses.
Detective Controls
-
SAF-M-9: Steganography Detection Layer — Scan for entropy anomalies in responses.
-
SAF-M-11: Logging & Telemetry Controls — Record responses and highlight suspicious patterns.
Response Procedures
Immediate Actions
-
Halt execution of workflows using potentially contaminated AI responses.
-
Sanitize all recent AI/MCP outputs.
-
Block user operations involving suspicious data.
Investigation Steps
-
Examine logs for hidden Unicode or encoded payloads.
-
Inspect logs for encoded or invisible Unicode.
-
Analyze recently copied files and repository commits.
-
Check copied files in developer environments.
-
Review CI/CD pipeline artifacts for hidden data propagation.
Remediation
-
Patch sanitization and validation modules.
-
Deploy stronger output validation.
-
Add regression tests for zero-width Unicode and high‑entropy blocks.
-
Deploy stricter schema enforcement for all AI-generated content.
Related Techniques
-
SAF-T1006: User Social Engineering Install
-
SAF-T1704: Compromised-Server Pivot
References
MITRE ATT&CK Mapping
- T1020 – Automated Exfiltration
- https://attack.mitre.org/techniques/T1020/
Version History
| Version | Date | Changes | Author |
|---|
| 1.0 | 2025-11-26 | Initial documentation | rajivsthh |
| 1.0 | 2025-11-26 | Initial documentation | rajivsthh |