Claude Data Analyst
April 23, 2026 · View on GitHub
First-pass data analysis toolkit for Claude Code. Point it at a CSV, Parquet, or Excel file and get an initial impression — correlations, PII audit, anomalies, hypothesis checks, a data dictionary, or a trend narrative.
Skills
| Skill | What it does |
|---|---|
correlation-analysis | Compute Pearson/Spearman/Kendall correlations and rank the strongest variable pairs. |
pii-flag | Scan columns and values for likely PII; mask samples; recommend remediation. |
anomaly-analysis | Three-layer anomaly sweep: value sanity, distribution outliers, multivariate/temporal. |
hypothesis-testing | Formalise a user-stated hypothesis, pick the right test, and return supports/refutes/inconclusive. |
data-dictionary-creator | Merge auto-profiled schema with the user's description into a full data dictionary. |
trend-analysis | Identify and narrate the major trends — directional, seasonal, compositional, per-segment. |
setup-data-workspace | Discover data files in the current repo, load them into a DuckDB database, and update CLAUDE.md with query instructions. |
data-enrichment | Diagnose gaps between the user's analytical goal and the dataset, propose external sources, plan and implement enrichment. |
multivariate-analysis | Partial correlations, VIF, regression with interactions, Lasso, and PCA to tell which variables actually drive the target and which are redundant. |
forensic-sweep | Flag data that looks suspiciously clean, imputed, smoothed, or pre-normalised — so the user knows what was done upstream before they got it. |
type-consistency-sweep | Detect within- and cross-file type inconsistencies that block analysis or DB loading; fix trivial cases or delegate to a Claude-Data-Wrangler skill. |
standard-deviation | Compute SD (plus variance, IQR, MAD, CV) for numeric columns with trustworthiness flags for skew, heavy tails, and small n. |
sample-size | Characterise the effective sample size per analytical question, flag underpowered segments, and give a go/no-go verdict. |
data-reporting | Generate a parametric PDF report (Typst) describing the dataset — schema, distributions, quality, findings from prior skills. |
Recommended CLI tooling
The skills assume (and will suggest) these are available on PATH:
duckdb— SQL over CSV/Parquet/Excel at speed.csvkit—csvstat,csvcut,csvlook.miller(mlr) — pivots and tallies on CSV.uv— run pandas/scipy/statsmodels/scikit-learn one-liners without a persistent venv.
Optional:
presidio-analyzer— ML-backed PII entity detection (viauv run --with presidio-analyzer).
Installation
claude plugins install claude-data-analyst@danielrosehill
License
MIT.