Known Issues

January 6, 2026 · View on GitHub

This document tracks current limitations and known issues in SlopCodeBench. We're actively working on improvements and welcome feedback.

Note: This is an initial release. Some rough edges are expected as we iterate toward production-quality software.

Documentation

Dashboard Documentation Missing

Severity: Medium
Issue: Dashboard functionality is undocumented
Impact: Users cannot use advanced visualization features without reverse-engineering
Status: Partially documented (see viz command)
Workaround: Use slop-code viz diff for diff visualization. See viz command documentation for details.
Tracking: [Issue #TBD]
Note: The viz diff command provides an interactive diff viewer. Full dashboard documentation is still needed.

Configuration

Agent/Model/Provider Configuration Complexity

Severity: Medium
Issue: Models, providers, and agents configuration is complex and not intuitive
Impact: Steep learning curve for new users
Status: Open to redesign suggestions
Feedback welcome: We acknowledge this is hard to use and are open to changes

Current workarounds:

See Agent Guide for detailed setup
Use provided config files in configs/ as templates
Check FAQ for common configurations

Due to specification evolution across checkpoints, some reference solutions may not pass all tests. However, the test cases themselves are verified to be correct. We prioritized test case accuracy over fixing all reference solutions.

File Merger

Severity: Medium
Checkpoints affected: 2, 3
Issue: Reference solution doesn't solve all cases
Impact: Tests are correct; solution needs updating
Workaround: Use tests as ground truth

Details:

Checkpoint 2: Current solution incomplete but tests verified
Checkpoint 3: Same as checkpoint 2

EVE Market Tools

Severity: Low
Checkpoints affected: 1, 2, 3
Issue: Reference solution fails some cases
Impact: Expected answers verified against actual market data
Workaround: Tests are authoritative

Details:

Checkpoints 1/2: Solution wrong for a few cases, but expected outputs verified with market data
Checkpoint 3: Exact yields off by 1-2 units in solution, but verifier correctly calculates reprocessing yields

EVE Industry

Severity: Low
Checkpoints affected: 5
Issue: Incorrect rounding for recursive jobs in reference solution
Impact: Tests are correct
Status: Will be fixed in future release
Workaround: None needed; tests are accurate

Dynamic Buffer

Severity: Medium
Checkpoints affected: 4
Issue: Reference solution has incorrect code generation for C++/Rust
Impact: Tests are identical across all languages and verified
Status: Will be fixed
Workaround: Use test cases as ground truth

Execution Server

Severity: Low
Checkpoints affected: 6
Issue: Solution fails some cases
Impact: Cases verified to match specification exactly
Workaround: Tests represent correct behavior per spec

Planned Improvements

The following are planned for future releases:

Fix all reference solutions to pass tests
Document dashboard functionality
Simplify agent/model/provider configuration
Add parallel execution support
Improve error messages and debugging

Reporting Issues

Found a bug or limitation not listed here?

Check GitHub Issues to see if it's already reported
If not, open a new issue with:
- Description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Your environment (OS, Python version, Docker version)
- Relevant logs or error messages

We appreciate your patience and feedback as we improve SlopCodeBench!