Contributing
December 24, 2025 ยท View on GitHub
We welcome contributions.
Adding an Agent
Want to evaluate a new coding agent? The framework is designed for this.
Requirements
- Implement the
Agentinterface fromsrc/slop_code/agent_runner/agent.py - Define lifecycle methods:
setup(),run(),reset(),cleanup() - Create an agent configuration YAML in
configs/agents/ - Add setup documentation in
docs/agents/
Resources
Adding a Problem
Contribute new evaluation problems to expand the benchmark. The process has three steps: design your problem, implement it, then validate and submit.
Step 1: Design Your Problem
Start with a full problem concept, then break it into checkpoints. Good problems test whether agents can write flexible, maintainable code that handles progressive requirements.
- Problem Design Philosophy - What makes a good problem, checkpoint patterns, common pitfalls
- Example: Designing a Calculator Problem - See the design thinking process in action
Step 2: Implement Your Problem
Turn your design into code with test cases, loader, and verifier.
- Step-by-Step Tutorial - Create your first problem (30 min hands-on)
- Problem Structure Reference - Directory layout and file roles
- Test Case Authoring - Writing effective test cases
- Creating Loaders & Verifiers - Implementation patterns
Step 3: Validate & Submit
Ensure your problem works and submit a PR.
- Validation Checklist - Pre-submission checklist with PR guidance
- Troubleshooting - Common issues and solutions
Quick Reference
| Component | Description |
|---|---|
config.yaml | Problem configuration with inline checkpoint definitions |
checkpoint_N.md | Specification for each checkpoint |
tests/test_checkpoint_N.py | Pytest tests for each checkpoint |
tests/conftest.py | Shared pytest fixtures |
tests/data/checkpoint_N/ | Test case data (core, hidden, errors) |
Development
Setup
uv sync
Testing
uv run pytest -q # Run all tests
uv run pytest tests/path/to/test_file.py # Run specific test
Linting
uv run ruff check . # Lint
uv run isort . # Format imports
Code Standards
- Line length: 80 characters
- Use
from __future__ import annotationsin all modules - Use pathlib (
Path) instead ofos.path - Type all function signatures
- Use Pydantic models for configuration
- Use
structlog.get_logger(__name__)for logging
Getting Help
uv run slop-code --help
Or look at the documentation.