TDD Workflow - krazyuniks/guitar-tone-shootout GitHub Wiki
TDD Workflow
Test-Driven Development workflow with enforced test immutability for GTS (Python/pytest). Validation is deterministic — controlled by Python (run_epic.py), not AI agents.
Philosophy
Traditional TDD relies on developer discipline. This workflow enforces TDD mechanically:
- Separate agents: Test author and implementer are different agents with different tool permissions
- Tool enforcement:
run_epic.pypasses--allowedToolsto Claude CLI based on agent YAML - Test locking: Tests are snapshotted before implementation, modifications are detected
- Deterministic validation: Python runs all validation gates — agents cannot lie about results
- Stop on failure: Any validation failure halts the epic immediately
The Phases
Phase 1: Test Specification
Agent: test-author
Tools: Read, Write, Edit, Bash, Glob, Grep (enforced by --allowedTools)
Path restrictions: tests/**/*.py only (prompt-enforced)
The test author writes tests from acceptance criteria only. They have no knowledge of how the feature will be implemented.
Rules:
- Tests MUST fail until implemented
- No trivial assertions (
assert True) - Test behaviour, not implementation details
- Each acceptance criterion needs at least one test
Phase 2: Red Verification
Purpose: Verify all tests fail Controlled by: Python (deterministic)
just tdd-red T43
This catches:
- Tests that pass without implementation (trivial/broken)
- Tests that error instead of fail (syntax issues)
If verification fails, test-author is retried once. If it fails again, the epic halts.
Phase 3: Test Lock
Purpose: Snapshot test files Controlled by: Python (deterministic)
just tdd-lock T43
This:
- Creates SHA-256 hashes of all test files
- Saves to
.tasks/.../snapshots/{task_id}.json - Commits with message:
test-lock: T43 tests ready for implementation
The test-lock: commit message is used by CI to detect the lock point.
Phase 4: Implementation
Agent: implementer
Tools: Read, Write, Edit, Bash, Glob, Grep (enforced by --allowedTools)
Path restrictions: libs/, apps/, sources/ only (prompt-enforced)
The implementer:
- Reads test files to understand expected behaviour
- Writes implementation to make tests pass
- Runs tests continuously:
just tdd tests/unit/path/to/test.py - Cannot modify test files (no write access + snapshot verification)
Phase 5: Green Verification
Purpose: Verify tests pass Controlled by: Python (deterministic)
just tdd-green T43
If verification fails, implementer is retried up to 2 times. Each retry includes the previous failure output. If all retries fail, the epic halts.
Phase 6: Full Validation
Purpose: Complete TDD validation Controlled by: Python (deterministic)
just tdd-complete T43
Checks:
- All tests pass
- Test files unchanged since lock (snapshot verification)
- Test quality checks pass (no trivial assertions)
- Regression tests pass
- E2E tests pass
If any check fails, the epic halts with an error report.
Retry Policy
| Agent | Retries | Context |
|---|---|---|
| test-author | 1 | Previous failure output included in retry prompt |
| implementer | 2 | Previous failure output included in retry prompt |
After retries are exhausted, run_epic.py writes a detailed error report and exits non-zero.
Test Quality Enforcement
Antipatterns Detected (Python/pytest)
| Pattern | Issue |
|---|---|
assert True |
Trivial assertion |
assert x (truthy only) |
Weak assertion |
mock.assert_called() alone |
Mock-only, no effect verification |
Empty test body (pass only) |
No assertions |
@pytest.mark.skip |
Skipped test |
time.sleep() |
Flaky indicator |
Good vs Bad Tests
Bad (trivial):
def test_validate_email_works():
result = validate_email('[email protected]')
assert result # Weak - truthy check
Good (specific):
def test_validate_email_rejects_invalid_format():
result = validate_email('not-an-email')
assert result.valid is False
assert result.error == 'Invalid email format'
Test Immutability Enforcement
Local Enforcement
Snapshot verification runs during tdd-complete:
python scripts/snapshot_tests.py verify T43
Detects:
- MODIFIED: Test file content changed
- DELETED: Test file removed
- ADDED: New test file (must be added in test phase)
CI Enforcement
GitHub Action checks test modifications since test-lock: commit:
- name: Check test file immutability
run: |
LOCK_COMMIT=$(git log --oneline --grep="test-lock:" | head -1)
CHANGED_TESTS=$(git diff --name-only $LOCK_COMMIT HEAD -- 'tests/')
if [ -n "$CHANGED_TESTS" ]; then
echo "::error::Test files modified after test-lock"
exit 1
fi
Workflow Commands
| Command | Description |
|---|---|
just tdd-test-phase T43 |
Start test phase (prints instructions) |
just tdd-red T43 |
Verify tests fail |
just tdd-lock T43 |
Snapshot and commit |
just tdd-impl-phase T43 |
Implementation hints |
just tdd-green T43 |
Verify tests pass |
just tdd-complete T43 |
Full validation |
just snapshot-verify T43 |
Check test immutability |
just snapshot-diff T43 |
Show changes since lock |
just snapshot-list |
List all test files |
GTS Test Patterns
tests/
├── unit/ # Pure logic, no I/O
├── integration/ # Real DB/Redis
├── regression/ # Stack connectivity
└── e2e/
└── python/ # Playwright E2E tests
All tests run in Docker:
docker compose exec -T webapp pytest tests/unit/ -v
Failure Recovery
Tests Modified During Implementation
# See what changed
just snapshot-diff T43
# Reset implementation, keep tests
git checkout -- libs/ apps/
# Or reset to lock point
git checkout $(git log --oneline --grep="test-lock: T43" --format="%H") -- libs/ apps/
Implementation Doesn't Pass Tests
# Check test output
just tdd tests/unit/path/to/test.py
# Run with verbose output
docker compose exec webapp pytest tests/unit/path/to/test.py -v --tb=long
Epic Halted by Validation
# Check error report
ls .tasks/projects/guitar-tone-shootout/epics/E42/logs/errors/
# Fix the issue, then re-run (picks up from current state)
python scripts/run_epic.py run 42
Bounce-Back Recovery
When tdd-green fails but evidence suggests the test itself has a bug (not the implementation), the system can automatically re-dispatch the test-author in FIX mode.
Detection Heuristics
Two heuristics determine if a green failure is likely a test bug:
| Heuristic | Conditions |
|---|---|
| A: Few failures | >= 75% scope files exist, <= 3 failures, passed > failures |
| B: Localised failures | All failures in a single test file, passed > 0 |
Bounce-Back Flow
tdd-greenfailsis_likely_test_bug()evaluates heuristics- If matched (and first bounce for this task): test-author re-dispatched in FIX mode
- Fixed tests are re-locked (
re_lock_after_bounce()) tdd-greenretried- If still fails: epic halts
Limit: Maximum 1 bounce per task. If the fix doesn't resolve it, the epic halts.
Skill Injection
The test-author agent receives GTS testing patterns automatically. build_test_author_prompt() appends the content of .claude/skills/gts-testing/SKILL.md to every test-author prompt.
This gives the agent access to:
- Fixture patterns and conftest setup
- Banned patterns (e.g.,
importlib.util,AsyncClient(app=...)) - Database session management patterns
- Test scaffolding conventions
Session Logging
Each run_epic.py execution creates a session log:
.tasks/projects/guitar-tone-shootout/epics/E42/logs/session_20260204_103000.log
The log records:
- All commands executed and their exit codes
- State transitions for each task
- Agent dispatch details (prompt length, tools, model)
- Validation gate results
- Git sync operations
- Bounce-back events
Git Sync
After every commit (test lock, implementation, validation), run_epic.py syncs with origin:
git pull --rebase --autostash origin main
git push --force-with-lease
This ensures:
- Parallel worktrees stay up to date
- Work is pushed incrementally (not only at epic completion)
- Rebase conflicts surface early
Note: --autostash handles unstaged session log changes during the sync. --force-with-lease is required after rebase.