Claude-Data-Wrangler

April 23, 2026 · View on GitHub

Data cleaning, enrichment, restructuring, and packaging skills for tabular and JSON datasets. Data visualisation is out of scope (handled by a separate plugin).

Skills

Cleaning & cleanliness

SkillPurpose
data-cleanliness-scanScan flat files (CSV/Parquet/JSON/Excel) and flag columns likely to fail SQL ingestion or analysis
standardise-country-namesNormalise inconsistent country names ("USA" vs "United States of America")
text-to-numericParse formatted strings like $4.27, 1,234.56, €1.2M, (500) into numeric columns
unicode-consistencyDetect and fix mixed Unicode normalisation, mojibake, invisible chars, confusables
date-wranglingConvert dates/times between ISO 8601, epoch (s/ms/µs/ns), with/without timezone, fiscal, week-date
iso-reviewAudit the dataset for fields that could be standardised to an ISO standard (3166, 4217, 639, 8601, LEI, ISIN, …) and optionally refactor

Enrichment

SkillPurpose
add-iso3166Add ISO 3166 country codes (alpha-2/3, numeric) to datasets referencing countries
enrich-with-currencyMap ISO 3166 codes to ISO 4217 currency codes (plus name / symbol)
data-enrichmentBrainstorm and rank enrichment opportunities (temporal, geo, entity, FX, embeddings, holidays …)

Documentation & provenance

SkillPurpose
add-data-dictionaryGenerate a data dictionary (Markdown / YAML / JSON / CSV) for a dataset
update-data-dictionaryKeep an existing data dictionary in sync after schema changes
data-dictionary-exportExport a data dictionary to a polished PDF via Typst
data-to-documentRender a dataset (or a filtered slice) to PDF via Typst, with layout auto-chosen from data shape, selectable fields, and custom column labels
add-changelogMaintain a dataset-focused CHANGELOG.md (Keep-a-Changelog, SemVer-adapted)

Reshape & format

SkillPurpose
csv-to-jsonBidirectional CSV ↔ JSON / JSONL conversion
json-restructureReshape JSON — flatten, nest, group-by, explode arrays, promote/demote fields
data-shapePropose a normalised SQL schema (tables, keys, relationships) from a flat source
data-comparabilityAlign multiple datasets — reconcile headers, types, vocabularies, units — for merge/union

Privacy

SkillPurpose
pii-flagDetect PII (names, emails, IDs, cards, coords, …) at cell-level with confidence scores
synthetic-data-overlayReplace PII with realistic synthetic substitutes preserving shape and referential integrity

Packaging & targets

SkillPurpose
database-guideRecommend a database backend (relational / analytical / document / graph / vector / time-series)
parquet-jsonl-packagePackage a dataset as Parquet and/or JSONL with compression and partitioning
sql-loadLoad a flat dataset into SQL (Postgres / MySQL / SQLite / MSSQL / DuckDB) with schema validation
graph-databaseReshape tabular/JSON data into nodes + edges, emit Cypher / GraphML / CSV bulk loads
vector-upsertEmbed text fields and upsert into a vector DB (Pinecone / Qdrant / Weaviate / pgvector / Chroma / Milvus)
hf-dataset-pushPublish a packaged dataset to Hugging Face Hub with dataset card
api-loaderPrepare and push data into a REST API or MCP server, from an OpenAPI spec or well-known SDK
geodata-formatterConvert CSV / tabular geodata into GeoJSON (or NDGeoJSON) with CRS reprojection and geometry inference
divergent-data-pipeBuild an incremental sync from a canonical upstream into a downstream project that has diverged (renames / enrichments), preserving the divergence

Conventions

Every skill follows the safety and data-layout rules in CONVENTIONS.md. Highlights:

  • Backup-before-destruction — any destructive edit (overwrite, mutate-in-place, remote load) must confirm an existing backup or create one first (file copy, Parquet/JSONL snapshot, or git commit).
  • New-file-by-default — outputs get a suffix (_iso3166, _numeric, _synthetic); overwrite only on explicit user request.
  • Data dictionary is the provenance log — every schema-changing operation writes a dated entry.
  • No plaintext secrets — connection passwords and API keys are referenced via env vars, 1Password, or prompt-at-runtime.

Typical pipeline

raw CSV
  → data-cleanliness-scan        (audit)
  → iso-review                   (flag standards opportunities)
  → unicode-consistency          (clean text)
  → standardise-country-names
  → add-iso3166
  → enrich-with-currency
  → text-to-numeric              (\$4.27 → 4.27)
  → date-wrangling               (normalise to ISO 8601 / epoch)
  → pii-flag                     (before any external publication)
  → synthetic-data-overlay       (if needed)
  → add-data-dictionary          (or update)
  → add-changelog
  → data-shape                   (plan SQL schema)
  → parquet-jsonl-package
  → sql-load / hf-dataset-push / vector-upsert / graph-database / api-loader
  → data-dictionary-export       (share the PDF)

Each skill updates the data dictionary so provenance is preserved end-to-end.

Installation

Plugin

claude plugins install Claude-Data-Wrangler@danielrosehill

Python dependencies (via uv)

The plugin's skills rely on a broad but optional dependency set. A uv-backed installer is provided:

# install uv once
curl -LsSf https://astral.sh/uv/install.sh | sh

# install all dependencies into a local .venv
./scripts/install-deps.sh

# or just the core tabular stack
./scripts/install-deps.sh --minimal

# or a specific group (core, iso, dates, text, pii, enrichment, sql, vector, graph, api, hf)
./scripts/install-deps.sh --group vector

# activate the venv for subsequent skill runs
source .venv/bin/activate

Aggregated requirements live in requirements.txt. Each SKILL.md lists its own minimum dependencies so you can install per-skill if preferred.

License

MIT — see LICENSE.