TDD Workflow - krazyuniks/guitar-tone-shootout GitHub Wiki

TDD Workflow

Test-Driven Development workflow with enforced test immutability for GTS (Python/pytest). Validation is deterministic — controlled by Python (run_epic.py), not AI agents.

Philosophy

Traditional TDD relies on developer discipline. This workflow enforces TDD mechanically:

  1. Separate agents: Test author and implementer are different agents with different tool permissions
  2. Tool enforcement: run_epic.py passes --allowedTools to Claude CLI based on agent YAML
  3. Test locking: Tests are snapshotted before implementation, modifications are detected
  4. Deterministic validation: Python runs all validation gates — agents cannot lie about results
  5. Stop on failure: Any validation failure halts the epic immediately

The Phases

Phase 1: Test Specification

Agent: test-author Tools: Read, Write, Edit, Bash, Glob, Grep (enforced by --allowedTools) Path restrictions: tests/**/*.py only (prompt-enforced)

The test author writes tests from acceptance criteria only. They have no knowledge of how the feature will be implemented.

Rules:

  • Tests MUST fail until implemented
  • No trivial assertions (assert True)
  • Test behaviour, not implementation details
  • Each acceptance criterion needs at least one test

Phase 2: Red Verification

Purpose: Verify all tests fail Controlled by: Python (deterministic)

just tdd-red T43

This catches:

  • Tests that pass without implementation (trivial/broken)
  • Tests that error instead of fail (syntax issues)

If verification fails, test-author is retried once. If it fails again, the epic halts.

Phase 3: Test Lock

Purpose: Snapshot test files Controlled by: Python (deterministic)

just tdd-lock T43

This:

  1. Creates SHA-256 hashes of all test files
  2. Saves to .tasks/.../snapshots/{task_id}.json
  3. Commits with message: test-lock: T43 tests ready for implementation

The test-lock: commit message is used by CI to detect the lock point.

Phase 4: Implementation

Agent: implementer Tools: Read, Write, Edit, Bash, Glob, Grep (enforced by --allowedTools) Path restrictions: libs/, apps/, sources/ only (prompt-enforced)

The implementer:

  1. Reads test files to understand expected behaviour
  2. Writes implementation to make tests pass
  3. Runs tests continuously: just tdd tests/unit/path/to/test.py
  4. Cannot modify test files (no write access + snapshot verification)

Phase 5: Green Verification

Purpose: Verify tests pass Controlled by: Python (deterministic)

just tdd-green T43

If verification fails, implementer is retried up to 2 times. Each retry includes the previous failure output. If all retries fail, the epic halts.

Phase 6: Full Validation

Purpose: Complete TDD validation Controlled by: Python (deterministic)

just tdd-complete T43

Checks:

  1. All tests pass
  2. Test files unchanged since lock (snapshot verification)
  3. Test quality checks pass (no trivial assertions)
  4. Regression tests pass
  5. E2E tests pass

If any check fails, the epic halts with an error report.

Retry Policy

Agent Retries Context
test-author 1 Previous failure output included in retry prompt
implementer 2 Previous failure output included in retry prompt

After retries are exhausted, run_epic.py writes a detailed error report and exits non-zero.

Test Quality Enforcement

Antipatterns Detected (Python/pytest)

Pattern Issue
assert True Trivial assertion
assert x (truthy only) Weak assertion
mock.assert_called() alone Mock-only, no effect verification
Empty test body (pass only) No assertions
@pytest.mark.skip Skipped test
time.sleep() Flaky indicator

Good vs Bad Tests

Bad (trivial):

def test_validate_email_works():
    result = validate_email('[email protected]')
    assert result  # Weak - truthy check

Good (specific):

def test_validate_email_rejects_invalid_format():
    result = validate_email('not-an-email')
    assert result.valid is False
    assert result.error == 'Invalid email format'

Test Immutability Enforcement

Local Enforcement

Snapshot verification runs during tdd-complete:

python scripts/snapshot_tests.py verify T43

Detects:

  • MODIFIED: Test file content changed
  • DELETED: Test file removed
  • ADDED: New test file (must be added in test phase)

CI Enforcement

GitHub Action checks test modifications since test-lock: commit:

- name: Check test file immutability
  run: |
    LOCK_COMMIT=$(git log --oneline --grep="test-lock:" | head -1)
    CHANGED_TESTS=$(git diff --name-only $LOCK_COMMIT HEAD -- 'tests/')
    if [ -n "$CHANGED_TESTS" ]; then
      echo "::error::Test files modified after test-lock"
      exit 1
    fi

Workflow Commands

Command Description
just tdd-test-phase T43 Start test phase (prints instructions)
just tdd-red T43 Verify tests fail
just tdd-lock T43 Snapshot and commit
just tdd-impl-phase T43 Implementation hints
just tdd-green T43 Verify tests pass
just tdd-complete T43 Full validation
just snapshot-verify T43 Check test immutability
just snapshot-diff T43 Show changes since lock
just snapshot-list List all test files

GTS Test Patterns

tests/
├── unit/           # Pure logic, no I/O
├── integration/    # Real DB/Redis
├── regression/     # Stack connectivity
└── e2e/
    └── python/     # Playwright E2E tests

All tests run in Docker:

docker compose exec -T webapp pytest tests/unit/ -v

Failure Recovery

Tests Modified During Implementation

# See what changed
just snapshot-diff T43

# Reset implementation, keep tests
git checkout -- libs/ apps/

# Or reset to lock point
git checkout $(git log --oneline --grep="test-lock: T43" --format="%H") -- libs/ apps/

Implementation Doesn't Pass Tests

# Check test output
just tdd tests/unit/path/to/test.py

# Run with verbose output
docker compose exec webapp pytest tests/unit/path/to/test.py -v --tb=long

Epic Halted by Validation

# Check error report
ls .tasks/projects/guitar-tone-shootout/epics/E42/logs/errors/

# Fix the issue, then re-run (picks up from current state)
python scripts/run_epic.py run 42

Bounce-Back Recovery

When tdd-green fails but evidence suggests the test itself has a bug (not the implementation), the system can automatically re-dispatch the test-author in FIX mode.

Detection Heuristics

Two heuristics determine if a green failure is likely a test bug:

Heuristic Conditions
A: Few failures >= 75% scope files exist, <= 3 failures, passed > failures
B: Localised failures All failures in a single test file, passed > 0

Bounce-Back Flow

  1. tdd-green fails
  2. is_likely_test_bug() evaluates heuristics
  3. If matched (and first bounce for this task): test-author re-dispatched in FIX mode
  4. Fixed tests are re-locked (re_lock_after_bounce())
  5. tdd-green retried
  6. If still fails: epic halts

Limit: Maximum 1 bounce per task. If the fix doesn't resolve it, the epic halts.

Skill Injection

The test-author agent receives GTS testing patterns automatically. build_test_author_prompt() appends the content of .claude/skills/gts-testing/SKILL.md to every test-author prompt.

This gives the agent access to:

  • Fixture patterns and conftest setup
  • Banned patterns (e.g., importlib.util, AsyncClient(app=...))
  • Database session management patterns
  • Test scaffolding conventions

Session Logging

Each run_epic.py execution creates a session log:

.tasks/projects/guitar-tone-shootout/epics/E42/logs/session_20260204_103000.log

The log records:

  • All commands executed and their exit codes
  • State transitions for each task
  • Agent dispatch details (prompt length, tools, model)
  • Validation gate results
  • Git sync operations
  • Bounce-back events

Git Sync

After every commit (test lock, implementation, validation), run_epic.py syncs with origin:

git pull --rebase --autostash origin main
git push --force-with-lease

This ensures:

  • Parallel worktrees stay up to date
  • Work is pushed incrementally (not only at epic completion)
  • Rebase conflicts surface early

Note: --autostash handles unstaged session log changes during the sync. --force-with-lease is required after rebase.