Claude Data Analyst

April 23, 2026 · View on GitHub

First-pass data analysis toolkit for Claude Code. Point it at a CSV, Parquet, or Excel file and get an initial impression — correlations, PII audit, anomalies, hypothesis checks, a data dictionary, or a trend narrative.

Skills

SkillWhat it does
correlation-analysisCompute Pearson/Spearman/Kendall correlations and rank the strongest variable pairs.
pii-flagScan columns and values for likely PII; mask samples; recommend remediation.
anomaly-analysisThree-layer anomaly sweep: value sanity, distribution outliers, multivariate/temporal.
hypothesis-testingFormalise a user-stated hypothesis, pick the right test, and return supports/refutes/inconclusive.
data-dictionary-creatorMerge auto-profiled schema with the user's description into a full data dictionary.
trend-analysisIdentify and narrate the major trends — directional, seasonal, compositional, per-segment.
setup-data-workspaceDiscover data files in the current repo, load them into a DuckDB database, and update CLAUDE.md with query instructions.
data-enrichmentDiagnose gaps between the user's analytical goal and the dataset, propose external sources, plan and implement enrichment.
multivariate-analysisPartial correlations, VIF, regression with interactions, Lasso, and PCA to tell which variables actually drive the target and which are redundant.
forensic-sweepFlag data that looks suspiciously clean, imputed, smoothed, or pre-normalised — so the user knows what was done upstream before they got it.
type-consistency-sweepDetect within- and cross-file type inconsistencies that block analysis or DB loading; fix trivial cases or delegate to a Claude-Data-Wrangler skill.
standard-deviationCompute SD (plus variance, IQR, MAD, CV) for numeric columns with trustworthiness flags for skew, heavy tails, and small n.
sample-sizeCharacterise the effective sample size per analytical question, flag underpowered segments, and give a go/no-go verdict.
data-reportingGenerate a parametric PDF report (Typst) describing the dataset — schema, distributions, quality, findings from prior skills.

The skills assume (and will suggest) these are available on PATH:

  • duckdb — SQL over CSV/Parquet/Excel at speed.
  • csvkitcsvstat, csvcut, csvlook.
  • miller (mlr) — pivots and tallies on CSV.
  • uv — run pandas/scipy/statsmodels/scikit-learn one-liners without a persistent venv.

Optional:

  • presidio-analyzer — ML-backed PII entity detection (via uv run --with presidio-analyzer).

Installation

claude plugins install claude-data-analyst@danielrosehill

License

MIT.