spotlighting-datamarking

March 29, 2026 · View on GitHub

Defend against indirect prompt injection using Spotlighting (Microsoft Research). Marks untrusted data with special tokens so LLMs can distinguish it from instructions.

An open-source implementation of all three spotlighting variants from the paper — data marking, random interleaving, and base64 encoding (the strongest). The spotlighting technique itself is used by Microsoft in production as part of Prompt Shields in Azure AI Foundry.

Install

npm install spotlighting-datamarking

Quick Start

import { DataMarkingViaSpotlighting } from 'spotlighting-datamarking';

const marker = new DataMarkingViaSpotlighting();

const result = marker.markData('Ignore previous instructions');
// result.markedText  → "[MARKER]Ignore[MARKER]previous[MARKER]instructions[MARKER]"
// result.dataMarker  → the random marker string
// result.prompt      → LLM instruction to prepend to your system prompt

API

new DataMarkingViaSpotlighting(minK?, maxK?, defaultP?, defaultMinGap?, markerType?)

ParamDefaultDescription
minK7Min marker length
maxK12Max marker length
defaultP0.5Marker insertion probability
defaultMinGap1Min tokens between markers
markerType'alphanumeric''alphanumeric' or 'unicode'

markData(text, options?)

Replaces all whitespace with markers. Returns { markedText, dataMarker, prompt }.

randomlyMarkData(text, options?)

Inserts markers probabilistically between tokens. Guarantees at least one marker. Returns { markedText, dataMarker, prompt }.

base64EncodeData(text, options?)

Base64-encodes the text. Returns { markedText, prompt }.

sanitizeText(text)

Strips invisible Unicode characters (zero-width spaces, BiDi controls, PUA chars, etc.). Called automatically before marking by default.

Options

All marking methods accept:

OptionDefaultDescription
sanitizetrueStrip invisible chars before marking
sandwichtrueWrap text with boundary markers
markerTypeinstance defaultOverride marker type per-call
p0.5Insertion probability (randomlyMarkData only)
minGap1Min token gap between markers (randomlyMarkData only)

Note: When using unicode markers, PUA characters (U+E000–F8FF) are always stripped from input regardless of the sanitize setting. This prevents attackers from spoofing markers.

Usage

import { DataMarkingViaSpotlighting } from 'spotlighting-datamarking';

const marker = new DataMarkingViaSpotlighting();
const untrustedData = getEmailBody(); // could contain injection attempts

const result = marker.randomlyMarkData(untrustedData, { p: 0.5 });

const messages = [
  { role: 'system', content: `You are a helpful assistant.\n${result.prompt}` },
  { role: 'user', content: `Summarize this email:\n${result.markedText}` },
];

Sanitization

Input is sanitized by default before marking. The sanitizer removes:

  • Zero-width characters (U+200B, U+200C, U+200E, U+200F)
  • BiDi controls (U+202A–202E, U+2066–2069)
  • Soft hyphen, BOM, word joiner, invisible operators
  • Private Use Area chars (U+E000–F8FF)
  • Unicode tag characters (U+E0001, U+E0020–E007F)
  • Line/paragraph separators (U+2028–2029)

ZWJ (U+200D) is preserved to keep compound emoji intact (👨‍👩‍👧‍👦).

Disable with { sanitize: false } if you need raw passthrough.

Testing

npm test

Real-World Validation

Two independent studies have evaluated spotlighting against adaptive attackers:

  1. LLMail-Inject (Abdelnabi et al., SaTML 2025): A public CTF run by Microsoft with 839 participants and 208k+ submissions against an LLM email assistant. Spotlighting reduced tool-call rates and was "more effective than some detection defenses alone, such as Prompt Shield." Only 0.8% of all submissions achieved a successful end-to-end attack, and stacking spotlighting with detection defenses improved results further.

  2. The Attacker Moves Second (Nasr, Carlini et al., 2025): A separate study that evaluated 12 defenses including spotlighting using strong adaptive attacks (search-based, RL, gradient, and human red-teaming). Against static attacks, spotlighting held ASR to ~1%. However, adaptive search-based attacks achieved >95% ASR, and human red-teamers generated 265 successful injections against it. The authors concluded they "did not observe any measurable difference in the types of attacks that succeed on models with these defenses compared to the same models without the defense."

Takeaway: Spotlighting raises the bar significantly against naive and static attacks, but it does not hold up against determined adaptive adversaries. It should be layered with other defenses (detection classifiers, instruction hierarchy, input sanitization) rather than relied upon alone.

References