Contributing to knowledge-rag
May 10, 2026 · View on GitHub
Thank you for considering contributing! This project is used daily by 70+ companies and teams. Quality is non-negotiable, but the process should still be welcoming and clear.
This guide explains how to propose changes, what we check, and what to expect from review.
Table of contents
- Quick start
- The 7 Pillars
- Local development
- Pre-commit hooks
- Running tests
- Pull request lifecycle
- Commit message convention
- Reporting bugs
- Reporting security issues
- Code of Conduct
Quick start
git clone https://github.com/lyonzin/knowledge-rag.git
cd knowledge-rag
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e .
pip install -r requirements-dev.txt # ruff, pytest, mypy, pre-commit
pre-commit install # one-time
Run the full local check before pushing:
pre-commit run --all-files
pytest tests/ -v
The 7 Pillars (Quality Gate)
Every PR is automatically evaluated against seven pillars. All must pass before manual review by @lyonzin.
| # | Pillar | What it covers | Tools |
|---|---|---|---|
| 1 | Security | SAST, secrets, dependency CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, Snyk, CodeQL, Socket |
| 2 | Stability | Test determinism, coverage trend, flake detection | pytest-rerunfailures, codecov trend gate, test count guard |
| 3 | Memory Leak | RSS growth under load, leak isolation | pytest-memray, baseline tests, nightly soak (1h continuous) |
| 4 | Versatility | OS × Python matrix, format matrix, locale matrix, preset matrix | matrix CI on Linux/Windows/macOS × 3.11/3.12/3.13 |
| 5 | Scalability | Performance regression, concurrent load | bench suite (MRR@5, p50/p95/p99 latency, throughput), perf gate at 10% regression |
| 6 | Versioning | Atomic version sync, API surface stability, CHANGELOG enforcement | pre-commit hook, griffe, conventional commits, backwards-compat tests |
| 7 | Quality | Style, types, docstrings, complexity, dead code | ruff, mypy strict, interrogate, radon, vulture |
Your PR will show 7 status checks at the bottom — one per pillar. If any is ❌, fix it before requesting review. If you genuinely cannot fix one (e.g., a regression that is intentional), explain why in the PR description and tag @lyonzin for an override discussion.
Local development
Recommended tooling
# Required
python >= 3.11
pip install -e .
# Development
pip install ruff pytest pytest-cov pytest-rerunfailures pytest-memray
pip install mypy interrogate radon vulture
pip install pre-commit hypothesis
# Optional (only if you touch benchmarks)
pip install pytest-benchmark
Project layout
mcp_server/ # Production code
tests/ # All tests (unit + property + integration + memory baseline)
bench/ # Performance benchmarks (run on PRs + nightly)
docs/ # User-facing documentation
examples/ # Sample MCP client configs
.github/workflows/ # CI definitions
Pre-commit hooks
pre-commit runs lightweight checks before each commit. Required — PRs that bypass pre-commit will fail CI on the same checks.
pre-commit install # install once
pre-commit run --all-files # run on full repo (occasional manual run)
Hooks enforced:
ruff checkandruff format --checkversion-sync—pyproject.toml/mcp_server/__init__.py/npm/package.jsonmust agreegitleaks— block secrets in commits
Running tests
# Fast: unit + property tests, no model downloads
pytest tests/ -v
# With coverage
pytest tests/ -v --cov=mcp_server --cov-report=term-missing
# Memory baseline (slow, requires pytest-memray)
pytest tests/test_memory_baseline.py -v --memray
# Property-based fuzzing (Hypothesis)
pytest tests/test_ingestion_property.py -v --hypothesis-show-statistics
# Benchmarks (slow, generates artifacts in bench/results/)
pytest bench/ -v --benchmark-only
HuggingFace offline mode is enforced in CI. If a test triggers a real download it will fail fast. Use unittest.mock.patch("mcp_server.server.TextEmbedding") and patch("mcp_server.server.TextCrossEncoder").
Pull request lifecycle
- Fork + branch —
feat/<short-desc>for features,fix/<issue>-<desc>for bugs,docs/<area>for docs,chore/<topic>for chores. - Open PR early as draft — visibility helps avoid duplicate work.
- Fill the PR template — every checkbox in the template is enforced. Mark
N/Aand explain if a section truly does not apply. - Pass the 7 Pillars — green checks across the board.
- Wait for review — @lyonzin reviews every PR line by line, even after green CI. Response time target: 48h on weekdays.
- Address feedback — push more commits, do not force-push during review (preserves discussion thread).
- Squash merge — reviewer (or you with permission) squashes with a clean conventional commit message.
What gets a fast track
- Pure docs / typo fixes
- New tests for existing behavior
- Backwards-compatible bug fixes with regression test
What needs more discussion
- New MCP tools (public API surface)
- Changes to embedding model, reranker model, or chunking strategy
- Anything touching ChromaDB schema or
index_metadata.jsonformat - Changes to the 7 Pillars themselves
What is generally rejected
- Style-only refactors with no behavior change (use ruff)
- Breaking API changes without strong justification + migration path
- New dependencies without explaining why existing options were ruled out
- PRs that disable failing tests instead of fixing them
Commit message convention
We use Conventional Commits. Enforced via commitlint in CI on PR titles and squash messages.
<type>(<scope>): <short summary>
<longer body explaining the why>
<footer with refs / breaking-change marker>
Allowed types: feat, fix, docs, chore, refactor, test, perf, build, ci, style.
Examples:
feat(server): add semantic_alpha config optionfix(embeddings): raise loud on load failuresdocs(readme): bump What's New to v3.9.0perf(search): use BM25 inverted index for 3x speedup
Breaking changes go in the footer: BREAKING CHANGE: explanation.
Reporting bugs
Use the Bug Report template. Required fields:
- Version (
pip show knowledge-ragoutput) - Reproduction steps
- Expected vs actual behavior
- Logs / stack traces (sanitize secrets)
Reporting security issues
Do not open a public issue for security vulnerabilities. See SECURITY.md for the responsible disclosure process.
Code of Conduct
This project follows the Contributor Covenant Code of Conduct. By participating you agree to uphold it.
Questions?
- General usage: GitHub Discussions
- Bugs: GitHub Issues with the bug template
- Security: SECURITY.md
- Direct: @lyonzin — but prefer public channels so the community benefits
Welcome aboard, and thank you for helping make knowledge-rag better.