Problem Authoring Troubleshooting Guide

December 22, 2025 · View on GitHub

This guide provides solutions to common issues encountered when creating and debugging evaluation problems with pytest.

Test Discovery Issues
Fixture Issues
Configuration Issues
Test Execution Issues
Assertion Issues
Static Assets Issues
Marker Issues
Debugging Tools

Test Discovery Issues

Problem: "No tests collected"

Symptoms:

collected 0 items
no tests ran in 0.12s

Causes and fixes:

Cause 1: Test file naming

# Wrong:
tests/checkpoint_1.py          # Missing 'test_' prefix

# Right:
tests/test_checkpoint_1.py     # Must start with 'test_'

Cause 2: Test function naming

# Wrong:
def core_passthrough(entrypoint_argv):  # Missing 'test_' prefix
    ...

# Right:
def test_core_passthrough(entrypoint_argv):
    ...

Cause 3: Wrong tests directory

# Wrong:
problem/test/conftest.py       # 'test' not 'tests'

# Right:
problem/tests/conftest.py      # 'tests' directory

Cause 4: Missing conftest.py

tests/
├── test_checkpoint_1.py       # Exists
└── (no conftest.py!)          # Tests may not discover

# Right:
tests/
├── conftest.py                # Required
└── test_checkpoint_1.py

Problem: "test not found" with parametrize

Symptoms:

ERRORS
E   fixture 'case' not found

Causes and fixes:

Cause: Empty cases list

CORE_CASES = load_cases(CASES_DIR / "core")  # Returns []

@pytest.mark.parametrize("case", CORE_CASES, ids=[c["id"] for c in CORE_CASES])
def test_core(case):  # No tests generated!
    ...

Fix: Handle empty lists:

CORE_CASES = load_cases(CASES_DIR / "core")

if CORE_CASES:
    @pytest.mark.parametrize("case", CORE_CASES, ids=[c["id"] for c in CORE_CASES])
    def test_core(case):
        ...

Or use pytest.skip:

@pytest.mark.parametrize("case", CORE_CASES or [{"id": "skip"}], ids=...)
def test_core(case):
    if case.get("id") == "skip":
        pytest.skip("No test cases found")
    ...

Fixture Issues

Problem: "fixture 'entrypoint_argv' not found"

Symptoms:

E       fixture 'entrypoint_argv' not found
>       available fixtures: ...

Causes and fixes:

Cause 1: Missing conftest.py

# tests/conftest.py - MUST EXIST
import shlex
import pytest

def pytest_addoption(parser):
    parser.addoption("--entrypoint", required=True)
    parser.addoption("--checkpoint", required=True)

@pytest.fixture(scope="session")
def entrypoint_argv(request):
    return shlex.split(request.config.getoption("--entrypoint"))

@pytest.fixture(scope="session")
def checkpoint_name(request):
    return request.config.getoption("--checkpoint")

Cause 2: Typo in fixture name

# Wrong:
def test_basic(entrypoint_arg):  # Missing 'v'
    ...

# Right:
def test_basic(entrypoint_argv):
    ...

Cause 3: Missing CLI arguments

# Wrong:
pytest tests/

# Right:
pytest tests/ --entrypoint="python main.py" --checkpoint=checkpoint_1

Problem: "scope mismatch"

Symptoms:

ScopeMismatch: You tried to access a function scoped fixture from a session scoped one

Causes and fixes:

# Wrong: session fixture using function-scoped tmp_path
@pytest.fixture(scope="session")
def shared_data(tmp_path):  # tmp_path is function-scoped!
    return tmp_path / "data"

# Right: Match scopes
@pytest.fixture(scope="function")
def test_data(tmp_path):  # Both function-scoped
    return tmp_path / "data"

# Or use tmp_path_factory for session scope
@pytest.fixture(scope="session")
def shared_data(tmp_path_factory):
    return tmp_path_factory.mktemp("data")

Problem: "Error in setup of test"

Symptoms:

ERROR at setup of test_basic
E   FileNotFoundError: [Errno 2] No such file or directory

Causes and fixes:

Cause: Fixture fails during setup

@pytest.fixture
def assets_dir():
    path = Path(__file__).parent / "assets"
    assert path.exists()  # Fails if directory missing!
    return path

Fix: Create missing directories or handle gracefully:

@pytest.fixture
def assets_dir():
    path = Path(__file__).parent / "assets"
    if not path.exists():
        pytest.skip("Assets directory not found")
    return path

Configuration Issues

Problem: "Problem not found"

Symptoms:

Error: Problem 'my_problem' not found

Causes and fixes:

Cause: Directory name doesn't match config

# config.yaml
name: my-problem        # Hyphen

# But directory is:
problems/my_problem/    # Underscore

Fix: Names must match exactly:

name: my_problem

Problem: "Checkpoint not found"

Symptoms:

Error: Checkpoint 'checkpoint_1' not found for problem 'my_problem'

Causes and fixes:

Cause 1: Missing in config.yaml

# config.yaml
checkpoints:
  checkpoint_2:    # No checkpoint_1!
    version: 1
    order: 1

Fix: Add all checkpoints:

checkpoints:
  checkpoint_1:
    version: 1
    order: 1
  checkpoint_2:
    version: 1
    order: 2

Cause 2: Missing test file

tests/
├── conftest.py
└── test_checkpoint_2.py   # No test_checkpoint_1.py!

Fix: Create test file for each checkpoint.

Problem: "Invalid marker"

Symptoms:

PytestUnknownMarkWarning: Unknown pytest.mark.custom_marker

Causes and fixes:

# config.yaml - Register custom markers
markers:
  custom_marker:
    group_type: FUNCTIONALITY

Or in conftest.py:

def pytest_configure(config):
    config.addinivalue_line("markers", "custom_marker: description")

Test Execution Issues

Problem: "Timeout on every test"

Symptoms:

All tests timeout
No output captured

Causes and fixes:

Cause 1: Infinite loop in submission

# Agent's code:
while True:
    process_data()  # Never exits!

Fix: Add timeout to subprocess calls:

result = subprocess.run(
    entrypoint_argv,
    timeout=30,  # 30 second timeout
    capture_output=True
)

Cause 2: Waiting for input

# Agent's code:
name = input("Enter name: ")  # Hangs waiting for stdin

Fix: Provide stdin or use --no-input flag in spec.

Cause 3: subprocess hangs

# Wrong: No timeout
result = subprocess.run(cmd, capture_output=True)

# Right: With timeout
try:
    result = subprocess.run(cmd, capture_output=True, timeout=30)
except subprocess.TimeoutExpired:
    pytest.fail("Command timed out")

Problem: "ImportError" during execution

Symptoms:

ImportError: No module named 'yaml'

Causes and fixes:

Cause: Missing dependencies

Fix 1: Document in spec:

## Requirements

Create a `requirements.txt` or `pyproject.toml`:

pyyaml>=6.0


**Fix 2**: Add to test_dependencies in config.yaml:
```yaml
test_dependencies:
  - pyyaml>=6.0

Problem: "FileNotFoundError" in tests

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'input.yaml'

Causes and fixes:

Cause: Working directory issues

# Wrong: Assumes current directory
(Path("input.yaml")).write_text(content)
subprocess.run(cmd)  # May run in different directory

# Right: Use tmp_path
def test_example(entrypoint_argv, tmp_path):
    (tmp_path / "input.yaml").write_text(content)
    subprocess.run(cmd, cwd=tmp_path)  # Explicit cwd

Assertion Issues

Problem: "AssertionError" with matching output

Symptoms:

AssertionError: assert '{"key": "value"}' == {'key': 'value'}

Causes and fixes:

Cause: Comparing string to dict

# Wrong: String vs dict
actual = result.stdout  # '{"key": "value"}'
expected = {"key": "value"}
assert actual == expected  # Fails!

# Right: Parse JSON first
import json
actual = json.loads(result.stdout)
assert actual == expected

Problem: "Whitespace differences"

Symptoms:

AssertionError:
  Expected: 'Hello World'
  Actual:   'Hello World\n'

Causes and fixes:

# Fix: Strip whitespace
actual = result.stdout.strip()
expected = "Hello World"
assert actual == expected

# Or: Be explicit about expectations
expected = "Hello World\n"
assert result.stdout == expected

Problem: "Floating point differences"

Symptoms:

AssertionError: assert 1.2300000000000002 == 1.23

Causes and fixes:

# Fix: Use pytest.approx
import pytest
assert result == pytest.approx(1.23, rel=1e-9)

# Or: Use math.isclose
import math
assert math.isclose(result, 1.23, rel_tol=1e-9)

Problem: "Dict comparison fails on order"

Symptoms:

AssertionError: assert {'b': 2, 'a': 1} == {'a': 1, 'b': 2}

This shouldn't happen in Python 3.7+ (dicts are ordered), but if comparing JSON output:

# Fix: Parse and compare as dicts
actual = json.loads(result.stdout)
expected = {"a": 1, "b": 2}
assert actual == expected  # Dict comparison ignores JSON key order

Problem: "Timestamp/ID differences"

Symptoms:

AssertionError:
  Expected: {"id": "abc123", "created_at": "2025-01-01T00:00:00Z"}
  Actual:   {"id": "xyz789", "created_at": "2025-01-01T00:00:05Z"}

Causes and fixes:

# Fix: Strip dynamic fields before comparison
def strip_dynamic(data):
    data = data.copy()
    data.pop("id", None)
    data.pop("created_at", None)
    return data

assert strip_dynamic(actual) == strip_dynamic(expected)

# Or: Validate field presence/types instead
assert "id" in actual
assert isinstance(actual["id"], str)

Static Assets Issues

Problem: "Static assets not found"

Symptoms:

FileNotFoundError: static_assets/files not found

Causes and fixes:

Cause 1: Environment variable not set

# In conftest.py
@pytest.fixture(scope="session")
def files_dir():
    env_path = os.environ.get("SCBENCH_ASSET_FILES")
    if env_path:
        return Path(env_path)
    # Fallback for local testing
    return Path(__file__).parent.parent / "static_assets" / "files"

Fix: Set environment variable or use fallback path.

Cause 2: Asset path doesn't exist

# config.yaml
static_assets:
  files:
    path: datasets           # But no problems/my_problem/datasets/!

Fix: Create directory or fix path:

static_assets:
  files:
    path: static_assets/files

Cause 3: Wrong placeholder syntax

# Wrong:
arguments: --files {static:files}      # Single braces

# Right:
arguments: --files {{static:files}}    # Double braces

Marker Issues

Problem: "Tests not categorized correctly"

Symptoms:

Error tests counted as Core
Functionality tests missing from report

Causes and fixes:

Cause: Missing markers

# Wrong: No marker = Core
def test_invalid_input(entrypoint_argv):
    ...  # This will be counted as CORE!

# Right: Mark as error
@pytest.mark.error
def test_invalid_input(entrypoint_argv):
    ...

Problem: "Multiple markers conflict"

Symptoms:

Test appears in wrong category
Unexpected GroupType assignment

Causes and fixes:

Priority order (highest to lowest):

regression
error
functionality
(unmarked = Core)

# This will be REGRESSION (higher priority)
@pytest.mark.error
@pytest.mark.regression
def test_legacy_error():
    ...

# To make it ERROR, remove regression marker
@pytest.mark.error
def test_legacy_error():
    ...

Debugging Tools

Tool 1: Run pytest directly

cd problems/my_problem

# Run with verbose output
pytest tests/ \
  --entrypoint="python solution.py" \
  --checkpoint=checkpoint_1 \
  -v

# Run single test
pytest tests/test_checkpoint_1.py::test_basic \
  --entrypoint="python solution.py" \
  --checkpoint=checkpoint_1 \
  -v

# Show print statements
pytest tests/ ... -s

# Stop on first failure
pytest tests/ ... -x

# Show full assertion diffs
pytest tests/ ... --tb=long

Tool 2: Debug test case manually

# Create test workspace
mkdir -p /tmp/test_debug
cd /tmp/test_debug

# Copy solution
cp -r /path/to/solution/* .

# Create input files manually
echo "key: value" > input.yaml

# Run command
python solution.py --input input.yaml

# Compare output

Tool 3: Print debugging

def test_example(entrypoint_argv, tmp_path):
    # Setup
    input_file = tmp_path / "input.yaml"
    input_file.write_text("key: value")

    # Debug: Print command
    cmd = entrypoint_argv + ["--input", str(input_file)]
    print(f"Running: {' '.join(cmd)}")

    result = subprocess.run(cmd, capture_output=True, text=True, cwd=tmp_path)

    # Debug: Print output
    print(f"Return code: {result.returncode}")
    print(f"Stdout: {result.stdout}")
    print(f"Stderr: {result.stderr}")

    assert result.returncode == 0

Run with -s to see print output:

pytest tests/ ... -s

Tool 4: Validate case files

# Check all YAML files are valid
find problems/my_problem/tests -name "*.yaml" -exec \
  python -c "import yaml; yaml.safe_load(open('{}'))" \; -print

Tool 5: Check test collection

# See what tests will be collected
pytest tests/ --collect-only \
  --entrypoint="python main.py" \
  --checkpoint=checkpoint_1

# Output shows all test IDs and markers

Tool 6: Run with PDB

# Drop into debugger on failure
pytest tests/ ... --pdb

# Or set breakpoint in code
def test_example():
    import pdb; pdb.set_trace()
    ...

Checklist for Debugging

When a problem isn't working:

Configuration:

name in config.yaml matches directory name
All checkpoints defined in config.yaml
Each checkpoint has a test file
static_assets paths exist

conftest.py:

Defines pytest_addoption with --entrypoint and --checkpoint
Defines entrypoint_argv fixture
Defines checkpoint_name fixture
All fixtures have correct scope

Test files:

Named test_checkpoint_N.py
Functions named test_*
Correct markers applied (error, functionality, regression)
Fixtures requested in function signature

Test execution:

subprocess.run has timeout
Using tmp_path for file isolation
Parsing output correctly (JSON, JSONL, text)
Stripping whitespace where needed

Assertions:

Comparing same types (both dicts, both strings)
Handling dynamic fields (IDs, timestamps)
Using pytest.approx for floats

Next Steps

Pytest Markers - Marker reference
Conftest Patterns - Fixture patterns
CLI Testing - subprocess patterns
Examples - Working examples to reference