Problem Authoring Troubleshooting Guide
December 22, 2025 · View on GitHub
This guide provides solutions to common issues encountered when creating and debugging evaluation problems with pytest.
Table of Contents
- Test Discovery Issues
- Fixture Issues
- Configuration Issues
- Test Execution Issues
- Assertion Issues
- Static Assets Issues
- Marker Issues
- Debugging Tools
Test Discovery Issues
Problem: "No tests collected"
Symptoms:
collected 0 items
no tests ran in 0.12s
Causes and fixes:
Cause 1: Test file naming
# Wrong:
tests/checkpoint_1.py # Missing 'test_' prefix
# Right:
tests/test_checkpoint_1.py # Must start with 'test_'
Cause 2: Test function naming
# Wrong:
def core_passthrough(entrypoint_argv): # Missing 'test_' prefix
...
# Right:
def test_core_passthrough(entrypoint_argv):
...
Cause 3: Wrong tests directory
# Wrong:
problem/test/conftest.py # 'test' not 'tests'
# Right:
problem/tests/conftest.py # 'tests' directory
Cause 4: Missing conftest.py
tests/
├── test_checkpoint_1.py # Exists
└── (no conftest.py!) # Tests may not discover
# Right:
tests/
├── conftest.py # Required
└── test_checkpoint_1.py
Problem: "test not found" with parametrize
Symptoms:
ERRORS
E fixture 'case' not found
Causes and fixes:
Cause: Empty cases list
CORE_CASES = load_cases(CASES_DIR / "core") # Returns []
@pytest.mark.parametrize("case", CORE_CASES, ids=[c["id"] for c in CORE_CASES])
def test_core(case): # No tests generated!
...
Fix: Handle empty lists:
CORE_CASES = load_cases(CASES_DIR / "core")
if CORE_CASES:
@pytest.mark.parametrize("case", CORE_CASES, ids=[c["id"] for c in CORE_CASES])
def test_core(case):
...
Or use pytest.skip:
@pytest.mark.parametrize("case", CORE_CASES or [{"id": "skip"}], ids=...)
def test_core(case):
if case.get("id") == "skip":
pytest.skip("No test cases found")
...
Fixture Issues
Problem: "fixture 'entrypoint_argv' not found"
Symptoms:
E fixture 'entrypoint_argv' not found
> available fixtures: ...
Causes and fixes:
Cause 1: Missing conftest.py
# tests/conftest.py - MUST EXIST
import shlex
import pytest
def pytest_addoption(parser):
parser.addoption("--entrypoint", required=True)
parser.addoption("--checkpoint", required=True)
@pytest.fixture(scope="session")
def entrypoint_argv(request):
return shlex.split(request.config.getoption("--entrypoint"))
@pytest.fixture(scope="session")
def checkpoint_name(request):
return request.config.getoption("--checkpoint")
Cause 2: Typo in fixture name
# Wrong:
def test_basic(entrypoint_arg): # Missing 'v'
...
# Right:
def test_basic(entrypoint_argv):
...
Cause 3: Missing CLI arguments
# Wrong:
pytest tests/
# Right:
pytest tests/ --entrypoint="python main.py" --checkpoint=checkpoint_1
Problem: "scope mismatch"
Symptoms:
ScopeMismatch: You tried to access a function scoped fixture from a session scoped one
Causes and fixes:
# Wrong: session fixture using function-scoped tmp_path
@pytest.fixture(scope="session")
def shared_data(tmp_path): # tmp_path is function-scoped!
return tmp_path / "data"
# Right: Match scopes
@pytest.fixture(scope="function")
def test_data(tmp_path): # Both function-scoped
return tmp_path / "data"
# Or use tmp_path_factory for session scope
@pytest.fixture(scope="session")
def shared_data(tmp_path_factory):
return tmp_path_factory.mktemp("data")
Problem: "Error in setup of test"
Symptoms:
ERROR at setup of test_basic
E FileNotFoundError: [Errno 2] No such file or directory
Causes and fixes:
Cause: Fixture fails during setup
@pytest.fixture
def assets_dir():
path = Path(__file__).parent / "assets"
assert path.exists() # Fails if directory missing!
return path
Fix: Create missing directories or handle gracefully:
@pytest.fixture
def assets_dir():
path = Path(__file__).parent / "assets"
if not path.exists():
pytest.skip("Assets directory not found")
return path
Configuration Issues
Problem: "Problem not found"
Symptoms:
Error: Problem 'my_problem' not found
Causes and fixes:
Cause: Directory name doesn't match config
# config.yaml
name: my-problem # Hyphen
# But directory is:
problems/my_problem/ # Underscore
Fix: Names must match exactly:
name: my_problem
Problem: "Checkpoint not found"
Symptoms:
Error: Checkpoint 'checkpoint_1' not found for problem 'my_problem'
Causes and fixes:
Cause 1: Missing in config.yaml
# config.yaml
checkpoints:
checkpoint_2: # No checkpoint_1!
version: 1
order: 1
Fix: Add all checkpoints:
checkpoints:
checkpoint_1:
version: 1
order: 1
checkpoint_2:
version: 1
order: 2
Cause 2: Missing test file
tests/
├── conftest.py
└── test_checkpoint_2.py # No test_checkpoint_1.py!
Fix: Create test file for each checkpoint.
Problem: "Invalid marker"
Symptoms:
PytestUnknownMarkWarning: Unknown pytest.mark.custom_marker
Causes and fixes:
# config.yaml - Register custom markers
markers:
custom_marker:
group_type: FUNCTIONALITY
Or in conftest.py:
def pytest_configure(config):
config.addinivalue_line("markers", "custom_marker: description")
Test Execution Issues
Problem: "Timeout on every test"
Symptoms:
- All tests timeout
- No output captured
Causes and fixes:
Cause 1: Infinite loop in submission
# Agent's code:
while True:
process_data() # Never exits!
Fix: Add timeout to subprocess calls:
result = subprocess.run(
entrypoint_argv,
timeout=30, # 30 second timeout
capture_output=True
)
Cause 2: Waiting for input
# Agent's code:
name = input("Enter name: ") # Hangs waiting for stdin
Fix: Provide stdin or use --no-input flag in spec.
Cause 3: subprocess hangs
# Wrong: No timeout
result = subprocess.run(cmd, capture_output=True)
# Right: With timeout
try:
result = subprocess.run(cmd, capture_output=True, timeout=30)
except subprocess.TimeoutExpired:
pytest.fail("Command timed out")
Problem: "ImportError" during execution
Symptoms:
ImportError: No module named 'yaml'
Causes and fixes:
Cause: Missing dependencies
Fix 1: Document in spec:
## Requirements
Create a `requirements.txt` or `pyproject.toml`:
pyyaml>=6.0
**Fix 2**: Add to test_dependencies in config.yaml:
```yaml
test_dependencies:
- pyyaml>=6.0
Problem: "FileNotFoundError" in tests
Symptoms:
FileNotFoundError: [Errno 2] No such file or directory: 'input.yaml'
Causes and fixes:
Cause: Working directory issues
# Wrong: Assumes current directory
(Path("input.yaml")).write_text(content)
subprocess.run(cmd) # May run in different directory
# Right: Use tmp_path
def test_example(entrypoint_argv, tmp_path):
(tmp_path / "input.yaml").write_text(content)
subprocess.run(cmd, cwd=tmp_path) # Explicit cwd
Assertion Issues
Problem: "AssertionError" with matching output
Symptoms:
AssertionError: assert '{"key": "value"}' == {'key': 'value'}
Causes and fixes:
Cause: Comparing string to dict
# Wrong: String vs dict
actual = result.stdout # '{"key": "value"}'
expected = {"key": "value"}
assert actual == expected # Fails!
# Right: Parse JSON first
import json
actual = json.loads(result.stdout)
assert actual == expected
Problem: "Whitespace differences"
Symptoms:
AssertionError:
Expected: 'Hello World'
Actual: 'Hello World\n'
Causes and fixes:
# Fix: Strip whitespace
actual = result.stdout.strip()
expected = "Hello World"
assert actual == expected
# Or: Be explicit about expectations
expected = "Hello World\n"
assert result.stdout == expected
Problem: "Floating point differences"
Symptoms:
AssertionError: assert 1.2300000000000002 == 1.23
Causes and fixes:
# Fix: Use pytest.approx
import pytest
assert result == pytest.approx(1.23, rel=1e-9)
# Or: Use math.isclose
import math
assert math.isclose(result, 1.23, rel_tol=1e-9)
Problem: "Dict comparison fails on order"
Symptoms:
AssertionError: assert {'b': 2, 'a': 1} == {'a': 1, 'b': 2}
This shouldn't happen in Python 3.7+ (dicts are ordered), but if comparing JSON output:
# Fix: Parse and compare as dicts
actual = json.loads(result.stdout)
expected = {"a": 1, "b": 2}
assert actual == expected # Dict comparison ignores JSON key order
Problem: "Timestamp/ID differences"
Symptoms:
AssertionError:
Expected: {"id": "abc123", "created_at": "2025-01-01T00:00:00Z"}
Actual: {"id": "xyz789", "created_at": "2025-01-01T00:00:05Z"}
Causes and fixes:
# Fix: Strip dynamic fields before comparison
def strip_dynamic(data):
data = data.copy()
data.pop("id", None)
data.pop("created_at", None)
return data
assert strip_dynamic(actual) == strip_dynamic(expected)
# Or: Validate field presence/types instead
assert "id" in actual
assert isinstance(actual["id"], str)
Static Assets Issues
Problem: "Static assets not found"
Symptoms:
FileNotFoundError: static_assets/files not found
Causes and fixes:
Cause 1: Environment variable not set
# In conftest.py
@pytest.fixture(scope="session")
def files_dir():
env_path = os.environ.get("SCBENCH_ASSET_FILES")
if env_path:
return Path(env_path)
# Fallback for local testing
return Path(__file__).parent.parent / "static_assets" / "files"
Fix: Set environment variable or use fallback path.
Cause 2: Asset path doesn't exist
# config.yaml
static_assets:
files:
path: datasets # But no problems/my_problem/datasets/!
Fix: Create directory or fix path:
static_assets:
files:
path: static_assets/files
Cause 3: Wrong placeholder syntax
# Wrong:
arguments: --files {static:files} # Single braces
# Right:
arguments: --files {{static:files}} # Double braces
Marker Issues
Problem: "Tests not categorized correctly"
Symptoms:
- Error tests counted as Core
- Functionality tests missing from report
Causes and fixes:
Cause: Missing markers
# Wrong: No marker = Core
def test_invalid_input(entrypoint_argv):
... # This will be counted as CORE!
# Right: Mark as error
@pytest.mark.error
def test_invalid_input(entrypoint_argv):
...
Problem: "Multiple markers conflict"
Symptoms:
- Test appears in wrong category
- Unexpected GroupType assignment
Causes and fixes:
Priority order (highest to lowest):
regressionerrorfunctionality- (unmarked = Core)
# This will be REGRESSION (higher priority)
@pytest.mark.error
@pytest.mark.regression
def test_legacy_error():
...
# To make it ERROR, remove regression marker
@pytest.mark.error
def test_legacy_error():
...
Debugging Tools
Tool 1: Run pytest directly
cd problems/my_problem
# Run with verbose output
pytest tests/ \
--entrypoint="python solution.py" \
--checkpoint=checkpoint_1 \
-v
# Run single test
pytest tests/test_checkpoint_1.py::test_basic \
--entrypoint="python solution.py" \
--checkpoint=checkpoint_1 \
-v
# Show print statements
pytest tests/ ... -s
# Stop on first failure
pytest tests/ ... -x
# Show full assertion diffs
pytest tests/ ... --tb=long
Tool 2: Debug test case manually
# Create test workspace
mkdir -p /tmp/test_debug
cd /tmp/test_debug
# Copy solution
cp -r /path/to/solution/* .
# Create input files manually
echo "key: value" > input.yaml
# Run command
python solution.py --input input.yaml
# Compare output
Tool 3: Print debugging
def test_example(entrypoint_argv, tmp_path):
# Setup
input_file = tmp_path / "input.yaml"
input_file.write_text("key: value")
# Debug: Print command
cmd = entrypoint_argv + ["--input", str(input_file)]
print(f"Running: {' '.join(cmd)}")
result = subprocess.run(cmd, capture_output=True, text=True, cwd=tmp_path)
# Debug: Print output
print(f"Return code: {result.returncode}")
print(f"Stdout: {result.stdout}")
print(f"Stderr: {result.stderr}")
assert result.returncode == 0
Run with -s to see print output:
pytest tests/ ... -s
Tool 4: Validate case files
# Check all YAML files are valid
find problems/my_problem/tests -name "*.yaml" -exec \
python -c "import yaml; yaml.safe_load(open('{}'))" \; -print
Tool 5: Check test collection
# See what tests will be collected
pytest tests/ --collect-only \
--entrypoint="python main.py" \
--checkpoint=checkpoint_1
# Output shows all test IDs and markers
Tool 6: Run with PDB
# Drop into debugger on failure
pytest tests/ ... --pdb
# Or set breakpoint in code
def test_example():
import pdb; pdb.set_trace()
...
Checklist for Debugging
When a problem isn't working:
Configuration:
-
namein config.yaml matches directory name - All checkpoints defined in config.yaml
- Each checkpoint has a test file
- static_assets paths exist
conftest.py:
- Defines
pytest_addoptionwith --entrypoint and --checkpoint - Defines
entrypoint_argvfixture - Defines
checkpoint_namefixture - All fixtures have correct scope
Test files:
- Named
test_checkpoint_N.py - Functions named
test_* - Correct markers applied (error, functionality, regression)
- Fixtures requested in function signature
Test execution:
- subprocess.run has timeout
- Using tmp_path for file isolation
- Parsing output correctly (JSON, JSONL, text)
- Stripping whitespace where needed
Assertions:
- Comparing same types (both dicts, both strings)
- Handling dynamic fields (IDs, timestamps)
- Using pytest.approx for floats
Next Steps
- Pytest Markers - Marker reference
- Conftest Patterns - Fixture patterns
- CLI Testing - subprocess patterns
- Examples - Working examples to reference