Claude Data Annotation Plugin
April 27, 2026 · View on GitHub
End-to-end toolkit for turning raw data into an annotated dataset and publishing it to Hugging Face. Covers the full lifecycle: ingest, profile, clean (PII, columns, format), design an annotation schema, annotate (interactively with Claude or via Gemini batch inference), review, and publish.
The plugin is built around a small set of orchestrator skills delegating to specialized subagents — instead of one micro-skill per micro-operation, the orchestrator looks at the data, talks to the user, and figures out which steps are needed.
Skills
shape-dataset— top-level prep workflow. Ingests a source, profiles it, proposes a prep plan (PII, columns, format, splits, schema), executes approved steps. Hands off to annotation or publication.annotate-with-claude— Claude annotates a small dataset interactively in-session against a locked schema. For runs where Label Studio or batch inference would be overkill (typically tens to low hundreds of records).scaffold-annotation-env— generates a workspace for large-scale annotation via the Gemini batch inference API, with Python boilerplate (run, poll, validate).hf-setup— creates a Hugging Face dataset repo (asks public/private), copies the prepared data over, generates the dataset card, pushes viahuggingface-cli.ingest-source— stages raw data from GitHub, a local path, or a remote URL into a known working directory.init-dataset-repo— initializes a local git repo with the conventional HF dataset layout, LFS rules, license, and card stub.
Subagents
The orchestrator skills delegate to these — they are not invoked directly by the user.
data-profiler— schema inference, stats, encoding, samples; flags concerns.pii-scanner— detects PII (direct and quasi), proposes redaction strategy per column; can apply on approval.column-curator— recommends keep/drop/rename/recast/derive per column for the target task.schema-designer— proposes annotation schema and guidelines from data + task; iterates with the user before locking.format-normalizer— executes format conversions, encoding fixes, JSON flattening, reshape-for-annotation, and splits.review-annotations— audits finished annotations: schema validation, label distribution, sampled spot-checks, agreement analysis.
Typical workflow
ingest-source
↓
shape-dataset ──→ data-profiler, pii-scanner, column-curator, format-normalizer, schema-designer
↓
annotate-with-claude (small) OR scaffold-annotation-env (large)
↓ ↓
review-annotations review-annotations
↓ ↓
hf-setup hf-setup
Installation
claude plugins marketplace update danielrosehill
claude plugins install data-annotation@danielrosehill
After installing, restart Claude Code.
Requirements
huggingface-cli(logged in) forhf-setup.GEMINI_API_KEYin the annotation workspace.envforscaffold-annotation-env.- Python with
pandas,pyarrow,presidio-analyzer,jsonschema(skill scripts install viauvon first run).
Data storage
- Plugin-managed working state:
${CLAUDE_USER_DATA:-${XDG_DATA_HOME:-$HOME/.local/share}/claude-plugins}/data-annotation/. - User-owned dataset repos: a path the user picks during onboarding (typically under
~/repos/or~/Documents/); only a pointer is stored under$CLAUDE_USER_DATA.
License
MIT.