`data
June 24, 2026 ยท View on GitHub
Linting library and tools for machine learning, statistical modelling, data, code.

Table of Contents
- Introduction
- Quick Start
- Installation
- Configuration
- Integrations
- Lint Catalog
- License
- Contributing
- References
- Acknowledgements
Introduction
DataLinter is a library for contextual linting of data and code. Its development started by rewriting Google's data linter, in Julia. The aim of the redesign was to provide a richer and faster experience while also providing the baseline benefits outlined in the original paper. DataLinter adds on top support for data contexts, such as code snippets or information about the type of analysis, which can lead to the detection of more complex, conceptual issues relating to data and code quality.
Key Features
- 27 data+code linters (including the Google linters)
- Docker image with compiled binaries, production ready
- CLI and HTTP server modes with zero-config
- CSV/Parquet/Arrow dataset support
- Text / JSON / HTML output support
- Flexible code querying through ParSitter.jl
- First-class R language support by tree-sitter-based code parsing
- Fully customizable rule engine (see configuration docs)
Integrations
Quick Start
Try it in seconds with Docker (no installation required):
# Lint a dataset (from the root directory of the repository)
./datalinter ./test/data/imbalanced_data.csv \
--code-path ./test/code/r_snippet_imbalanced.r \
--config-path ./config/r_modelling_config.toml \
--log-level error
# Or run the server for HTTP API use
./datalinterserver \
-p 10000 \
--config-path ./config/r_modelling_config.toml \
--log-level debug
Installation
Docker image
The latest Docker image can be downloaded with
docker pull ghcr.io/zgornel/datalinter-compiled:latest
Specific versions are also tagged and accessible with (example for v0.1.4)
docker pull ghcr.io/zgornel/datalinter-compiled:v0.1.4
Pre-compiled binaries (Linux x86-64)
Download the latest datalinter-compiled-latest-linux-x86-64.zip from the Releases page. Contains both CLI and server binaries.
Note: Windows and macOS users should use Docker or install via Julia.
Julia
Installation can be performed also from the Julia REPL with
using Pkg; Pkg.add(url="https://github.com/zgornel/DataLinter")
Configuration
Check out the documentation for information on configuring, running and integrating the linters.
Lint Catalog
DataLinter ships with 27 built-in linters. Description available here.
License
This code has an MIT license.
Contributing
Please see CONTRIBUTING.md on how to contribute.
To report a bug or request a feature, please file an issue.
Recent changes can be found in CHANGELOG.md.
References
[1] https://en.wikipedia.org/wiki/Lint_(software)
[2] N. Hynes, D. Sculley, M. Terry "The data linter: Lightweight, automated sanity checking for ml data sets", NIPS MLSys Workshop, 2017; paper
[3] The data-linter code repository
Acknowledgements
The initial version of DataLinter was fully inspired by this work written by Google brain research.