DataLinter GitLab CI
June 25, 2026 · View on GitHub
A GitLab CI to run DataLinter—a contextual data and code linter for machine-learning and statistical-modelling workflows—inside a reproducible Docker container. Results are automatically posted as a comment on pull requests.
Table of Contents
- What is DataLinter?
- Features
- Quick Start
- Inputs
- Outputs
- Workflow Examples
- How It Works
- DataLinter Configuration
- Troubleshooting
- Contributing
- License
What is DataLinter?
DataLinter detects potential issues in data files and associated code (R, Python, etc.) used in ML/statistical pipelines.
Features
- Reproducible execution via a pre-built Docker image (
ghcr.io/zgornel/datalinter-compiled:latest). - Automatic mounting of data, code, and configuration directories.
- Full output capture (stdout + stderr) wrapped in a Markdown code block.
- Automatic PR commenting (edits previous comment if it exists).
- Composite action—works on any runner that supports Docker and is configured for Docker-in-Docker (dind).
- Flexible logging levels.
Quick Start
- Add
gitlab-ci.ymlto your main branch before creating a PR. - Create a
GITLAB_TOKENCI/CD variable (Project or Personal Access Token with api scope). - Ensure the CI/CD runner is configured for Docker-in-Docker (dind).
Inputs
datalinter-gitlab-ci has the following inputs:
DATA_PATH(required), path to the data relative to thegitlab-ci.ymlCODE_PATH(required), path to code, relative togitlab-ci.ymlCONFIG_PATH(required), path to thedatalinter.tomlfile configuration, relative togitlab-ci.ymlLOG_LEVEL(optional, default is'debug'), logging level; can be'debug','info','warning'and'error'.
Outputs
Complete DataLinter output (stdout + stderr) formatted as a Markdown code block.
How it works
-
The supplied paths are split into directory and filename components.
-
It pulls
ghcr.io/zgornel/datalinter-compiled:latest -
Three Docker volumes are mounted:
- Code directory →
/tmp - Data directory →
/_data - Config directory →
/_config
- Code directory →
-
datalinterbinary is executed inside the container. -
All output is captured and stored in the
datalinter_output.txt. -
On
merge_request_event(that includespushon the MR branch), the output is automatically posted (or edited) as a MR comment.
DataLinter Configuration
Create a datalinter.toml file in your repository. Full documentation and example configurations are available in the DataLinter repository.
Troubleshooting
- “No such file or directory” → Verify the three paths exist relative to the checkout directory.
- Output too verbose → Set log-level:
'warning'or'error'.
Reporting Bugs
Please file an issue to report a bug or request a feature.
License
This project is licensed under the MIT License. See LICENSE for details.