explainX: LLM-native Explainable AI

June 14, 2026 · View on GitHub

explainX 3.0 is a modern, LLM-native rewrite of the explainability engine. Train any machine-learning model, then let a human or an LLM agent inspect it: understand why a prediction was made, surface bias, find the minimal change that flips a decision, and feed those insights back into training.

Where the original explainX rendered a human-only Plotly dashboard on a 2020 dependency stack, this rewrite returns structured, machine-readable results (typed objects that serialize to JSON) plus a natural-language summary — usable from plain Python and over the Model Context Protocol (MCP) so agents like Claude can call it as tools while they build models.

The goal: bring state-of-the-art explainability research into one place, with an interface designed for the era where LLMs train and debug models.


Why

  1. Explain predictions — global feature importance + per-prediction reasoning.
  2. Debug models — counterfactuals and partial-dependence curves show what the model actually learned.
  3. Detect bias — group-fairness metrics (disparate impact, demographic parity, equal opportunity) answer "is my model rejecting one group regardless of profile?"
  4. Build trust — a plain-language summary for stakeholders and agents.
  5. Close the loop — results are structured so an LLM can read them and decide how to fix the training data or model.

What's inside

A unified API over the methods the XAI literature identifies as the most deployed — SHAP, LIME, surrogate trees and counterfactuals — plus modern additions (ALE, anchors) and a 2024–2025 research frontier most tools skip: quantifying whether an explanation can be trusted.

CapabilityMethodNotes
Global importanceSHAP (auto) → permutation → intrinsicSHAP used automatically when installed
Local explanationSHAP (auto) → model-agnostic ablationper-prediction signed contributions
Local surrogateLIME (from scratch)local weighted linear approximation
Sufficient rulesAnchorshigh-precision IF-THEN rule per prediction
Counterfactualsgreedy model-agnostic searchsmallest change that flips a decision
Global surrogatedecision tree + fidelityinspectable glassbox rules + how faithful they are
Feature effectsPDP and ALEALE stays unbiased under feature correlation
Explanation qualityfaithfulness + stabilitydoes the explanation reflect the model, and is it robust?
Counterfactuals & recoursegreedy search with immutable / monotonic constraintsactionable "what to change"
Uncertaintyconformal predictiondistribution-free prediction sets / intervals with coverage guarantee
Fairness / biasdemographic parity, disparate impact (4/5 rule), equal opportunityper sensitive attribute
Bias mitigationpost-processing per-group thresholdsdetect → fix
InteractionsFriedman's H-statisticwhich features matter together
Example-basedprototypes & criticisms (MMD)representative vs. atypical cases
Metricsclassification + regressionaccuracy/precision/recall/f1/auc, r²/mae/rmse
Monitoringdata drift (PSI + KS)reference vs. current dataset
LLM narrationClaude (claude-opus-4-8)plain-language briefings / Q&A grounded in the report
ReportingHTML export + CLI + dashboardshareable artifact; no-code usage

Improve accuracy (data-centric diagnostics)

Beyond explaining a model, explainX helps you make it more accurate — the data-centric-AI playbook, returned as actionable reports:

DiagnosticWhat it findsWhy it improves accuracy
Error analysisdata slices with the highest error (slice discovery)tells you where to add data / features or split the model
Label issueslikely-mislabeled rows (confident learning)cleaning labels is often the highest-ROI accuracy lever
Target leakagefeatures that alone predict the targetcatches inflated offline accuracy that collapses in production
CalibrationECE / Brier + reliabilityflags untrustworthy probabilities and recommends a fix
ex.error_analysis()   # ErrorAnalysis: worst slices + recommendation
ex.label_issues()     # LabelIssues: rows to relabel (cross-validated)
ex.leakage()          # LeakageReport: suspected leaky features
ex.calibration()      # CalibrationReport: ECE, Brier, fix recommendation

Works with any ML framework

explainX speaks the scikit-learn predict / predict_proba convention, so many frameworks work with no wrapping: scikit-learn, XGBoost, LightGBM, CatBoost (their sklearn-API estimators). For anything else, wrap_model() adapts it — native XGBoost/LightGBM Boosters, Keras/TensorFlow, PyTorch, statsmodels, or any custom prediction function:

from explainx import explain_model, wrap_model

explain_model(sklearn_or_xgb_or_lgbm_or_catboost_model, X, y)      # direct
explain_model(wrap_model(keras_or_torch_model, task="classification"), X, y)
explain_model(wrap_model(predict_proba_fn=my_api_call, classes=[0, 1]), X, y)

Runnable, studyable examples for every framework live in examples/ (one file per framework).

Install

⚠️ pip install explainx does not give you 3.0 yet. This LLM-native rewrite (v3.0.0) has not been published to PyPI, so pip install explainx currently still installs the legacy 2.x package. Until the 3.0 release is on PyPI, use one of the methods below.

From PyPI (works once 3.0.0 is published):

pip install "explainx[all]"   # core + SHAP + MCP + drift + LLM narration + dashboard
# or minimal:
pip install explainx          # core only (extras optional)

From GitHub (installs the current 3.0 code on master):

pip install "git+https://github.com/explainX/explainx.git"
# with all optional extras:
pip install "explainx[all] @ git+https://github.com/explainX/explainx.git"

From source (for development / running the examples & tests):

git clone https://github.com/explainX/explainx.git
cd explainx
pip install -e ".[all]"       # editable install with every extra
pytest                        # run the test suite

Extras can be combined or used individually: shap, mcp, drift, llm, dashboard, or all (e.g. pip install "explainx[shap,dashboard]").

Python API

from explainx import explain_model

report = explain_model(
    model, X_test, y_test,
    sensitive_features=["gender"],   # run bias analysis on these columns
    n_local=3,                       # explain a few individual predictions
)

print(report.summary)      # natural-language briefing for a human/LLM
report.to_dict()           # full structured result (JSON-ready)
report.to_json()

Need finer control? Use the stateful explainer:

from explainx import ModelExplainer

ex = ModelExplainer(model, X_test, y_test)
ex.metrics()                       # ModelMetrics
ex.importance()                    # GlobalImportance (SHAP when available)
ex.explain(index=0, top_k=5)       # LocalExplanation (SHAP/ablation)
ex.lime(index=0)                   # LocalExplanation (LIME)
ex.anchor(index=0)                 # Anchor: high-precision sufficient rule
ex.fairness("gender")              # FairnessReport
ex.counterfactual(index=0)         # Counterfactual: minimal flip
ex.recourse(index=0, immutable_features=["age", "gender"])  # actionable recourse
ex.surrogate()                     # SurrogateExplanation: glassbox tree + fidelity
ex.partial_dependence("income")    # PartialDependence curve
ex.ale("income")                   # ALEResult: correlation-robust effect
ex.explanation_quality(index=0)    # ExplanationQuality: faithfulness + stability
ex.conformal(X_cal, y_cal, X_test) # ConformalResult: guaranteed-coverage sets/intervals
ex.mitigate_bias("gender")         # MitigationResult: per-group thresholds that fix parity
ex.interactions(top_k=5)           # InteractionResult: Friedman H-statistic
ex.prototypes()                    # PrototypesResult: representative + atypical rows

LLM narration (optional)

from explainx.narrate import narrate_report   # needs: pip install "explainx[llm]"

report = explain_model(model, X_test, y_test, sensitive_features=["gender"])
print(narrate_report(report, question="Why was applicant 5 rejected, and what would change it?"))

The engine computes the evidence (SHAP, fairness, counterfactuals, conformal sets); Claude narrates it. Numbers stay in the engine, prose comes from the LLM — so the explanation is grounded, not hallucinated.

Monitoring & reporting

from explainx import detect_drift, save_html

detect_drift(reference_df, current_df)   # DriftReport (PSI + KS per feature)
save_html(report, "report.html")         # shareable page; embeds the full JSON

Interactive dashboard

pip install "explainx[dashboard]"
explainx-dashboard

Opens a Streamlit app: upload a fitted model + dataset, then run any module (importance, local/LIME/anchor, counterfactual & recourse, PDP/ALE, interactions, fairness, mitigation, conformal, prototypes, quality, drift) or the full report, see live tables and charts, and download the HTML/JSON.

Global importanceLocal explanation
Global importance viewLocal explanation view
Fairness (bias detected)Full report
Fairness viewFull report view

No-code CLI

explainx report --model m.joblib --data d.csv --target y --sensitive gender --html out.html
explainx bias   --model m.joblib --data d.csv --target y --sensitive gender
explainx drift  --reference train.csv --current prod.csv

Try the demo

python -m explainx.examples.demo

It trains a deliberately gender-biased loan model and shows the fairness check firing, plus a counterfactual that flips a rejection to an approval.

Use it from an LLM agent (MCP)

Start the server (stdio transport):

explainx-mcp              # installed console script
# or:  python -m explainx.mcp_server

Register it with an MCP client (e.g. Claude Desktop / Claude Code):

{
  "mcpServers": {
    "explainx": { "command": "explainx-mcp" }
  }
}

The agent saves a fitted model and dataset to disk, then calls tools by path:

ToolPurpose
explain_modelFull report (metrics, importance, local, fairness, surrogate, quality)
feature_importanceGlobal importance ranking
explain_predictionWhy one row was predicted as it was (SHAP/ablation)
lime_explain_predictionLocal LIME explanation for one row
anchor_ruleHigh-precision sufficient rule for one row
counterfactualMinimal change that flips a row's class
surrogate_rulesGlassbox decision-tree rules + fidelity
check_biasGroup-fairness analysis on a sensitive feature
model_metricsPerformance metrics
partial_dependenceMarginal effect curve for a feature
accumulated_local_effectsCorrelation-robust effect curve (ALE)
explanation_qualityFaithfulness + stability of an explanation
conformal_predictionGuaranteed-coverage prediction sets / intervals
actionable_recourseMinimal flip respecting immutable features
mitigate_biasPer-group thresholds that equalize selection rate
feature_interactions_toolStrongest pairwise interactions (H-statistic)
prototypes_and_criticisms_toolRepresentative + atypical rows
detect_data_driftDistribution drift between two datasets
error_analysisWorst-performing data slices (slice discovery)
label_issuesLikely-mislabeled rows (confident learning)
detect_target_leakageFeatures that leak the target
assess_calibrationProbability calibration (ECE / Brier)
html_reportWrite a shareable HTML report

Each returns a JSON-ready dict the agent can reason over — e.g. read a disparate_impact_ratio below 0.8, conclude the model is biased, and rebalance the training data.

# what the agent does first:
import joblib
joblib.dump(model, "model.joblib")
df.to_csv("data.csv", index=False)   # features + target column
# then it calls:  check_bias(model_path="model.joblib", data_path="data.csv",
#                            sensitive_feature="gender", target_column="approved")

Example outputs

All outputs below come from the bundled demo — a deliberately gender-biased loan-approval model. Reproduce them with python docs/generate_examples.py.

Natural-language summary (explain_model(...).summary):

Model `RandomForestClassifier` is a classification model evaluated on 800 samples across 4 features.
Performance: accuracy=1.000, precision=1.000, recall=1.000, f1=1.000, roc_auc=1.000.
The most influential features (via shap_mean_abs) are: credit_score (0.257), gender (0.168), debt_ratio (0.128), income (0.062).
A depth-4 decision-tree surrogate reproduces the model with accuracy=0.896 fidelity, giving an inspectable rule set.
Explanation quality (shap): faithfulness=1.00, stability=0.97 (higher is more trustworthy; ~1.0 is excellent).
Fairness on `gender`: BIAS DETECTED. Disparate impact ratio 0.37 is below the 0.8 four-fifths threshold:
group '0' receives the positive outcome (1) at 22.6% vs '1' at 61.3%. Demographic parity gap of 38.7%.
Recommended next steps: rebalance/reweight the training data across the sensitive groups, consider
removing or decorrelating proxy features, or apply a fairness constraint, then re-evaluate.

Global importanceex.importance()  |  Local explanationex.explain(0)

global importancelocal explanation

Feature effectsex.partial_dependence(...) / ex.ale(...)  |  Fairnessex.fairness("gender")

pdp and alefairness

Interactionsex.interactions()  |  Conformal coverageex.conformal(...)  |  Driftdetect_drift(...)

interactionsconformaldrift

Counterfactual / recourse (gender held immutable):

credit_score: 530.3 -> 739.3   =>  prediction flips 0 (rejected) -> 1 (approved)

Anchor (sufficient rule): IF 410 <= credit_score <= 584 THEN rejected (precision 0.96, coverage 0.21)

Glassbox surrogate (accuracy=0.859 fidelity to the model):

|--- credit_score <= 672.83
|   |--- gender <= 0.50
|   |   |--- income <= 62.72  -> rejected
|   |   |--- income >  62.72  -> rejected
|   |--- gender >  0.50
|   |   |--- debt_ratio <= 0.44 -> approved
|   |   |--- debt_ratio >  0.44 -> rejected
|--- credit_score >  672.83 ...

Bias mitigationex.mitigate_bias("gender"): demographic-parity gap 38.7% → 0.2% via per-group thresholds.

Tests

pytest          # or: python -m pytest explainx/tests

Migrating from legacy explainX

The 2020 Dash dashboard (explain.py, main.py, lib/) and its pinned, no-longer-installable stack have been removed in favour of this engine. The new import is explainx; explanations are returned as data rather than rendered as a web app, which is what makes them consumable by both humans and LLMs.

License

MIT