Planning Phase 03 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 03: Scoring Assessment

Auto-generated from .planning/phases/03-scoring-assessment
Last synced: 2026-04-02

Context & Decisions

Phase 3: Scoring & Assessment - Context

Gathered: 2026-03-24 Status: Ready for planning Mode: Auto-generated (full user authorization — Claude's discretion on all decisions)

## Phase Boundary

Complete the scoring system with: (1) real-time coaching suggestions wired into the SSE session flow, (2) detailed post-session reports with strengths/weaknesses/improvement areas exposed via API, (3) admin-customizable scoring rubrics with full CRUD, and (4) historical scoring data persistence and query APIs for trend analysis. Phase 2 built the scoring foundation (mock scores, 5 dimensions, basic feedback page). Phase 3 finishes the job — wiring the existing scaffolding, adding missing APIs, and building the admin rubric management UI.

## Implementation Decisions

Real-time Coaching Suggestions

  • Wire existing suggestion_service.py into the SSE message flow — call after each user message alongside HCP response generation
  • Suggestions delivered via existing CoachEventType.SUGGESTION SSE events — frontend HintsPanel already handles these
  • Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
  • Add dedicated GET /api/v1/sessions/{id}/suggestions endpoint to retrieve accumulated suggestions for a session

Post-session Reports

  • Wire existing report_service.py to a new GET /api/v1/sessions/{id}/report API endpoint
  • Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
  • Enable PDF export using browser print-to-PDF (CSS @media print) — lightweight approach, no server-side PDF generation
  • Historical comparison: provide previous session score via API for RadarChart overlay

Admin Scoring Rubrics

  • Create rubric_service.py with full CRUD operations using existing model + schemas
  • Create rubrics API router at /api/v1/rubrics with admin-only access
  • Build admin rubric management page at /admin/scoring-rubrics — list view + editor with dimension config
  • Wire scoring service to use rubric dimensions when available, fall back to scenario weights
  • Default rubric seeded for F2F scenario type with the 5 standard dimensions

Score Persistence & Analytics

  • Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
  • Add GET /api/v1/scoring/history endpoint for user's score history across sessions
  • Add trend calculation in service layer — compute improvement percentage per dimension over last N sessions
  • Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)

Claude's Discretion

  • Database schema details for any new columns/tables
  • Exact rubric editor UI layout and interaction patterns
  • Score aggregation algorithm details (moving average vs simple average for trends)
  • CSS print stylesheet specifics for PDF export
  • Test structure and mock data patterns

<code_context>

Existing Code Insights

Reusable Assets

  • backend/app/services/suggestion_service.py — keyword-based suggestion generator (EXISTS, not wired)
  • backend/app/services/report_service.py — full report generator parsing ScoreDetail JSON (EXISTS, not wired)
  • backend/app/models/scoring_rubric.py — ScoringRubric ORM model (EXISTS, uncommitted)
  • backend/app/schemas/scoring_rubric.py — RubricCreate/Update/Response with weight validation (EXISTS, uncommitted)
  • backend/app/schemas/report.py — SessionReport, DimensionBreakdown (EXISTS, uncommitted)
  • backend/app/schemas/suggestion.py — SuggestionType, SuggestionCreate/Response (EXISTS, uncommitted)
  • backend/alembic/versions/16f9f0ba6e9d_add_scoring_rubrics_table.py — Migration (EXISTS, uncommitted)
  • Frontend: HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary all exist
  • Frontend: SSE hook handles hint events; scoring hooks and API client exist
  • backend/app/services/prompt_builder.pybuild_scoring_prompt() ready for LLM integration

Established Patterns

  • Service layer pattern: business logic in services/*.py, routers delegate only
  • Pydantic v2 schemas with ConfigDict(from_attributes=True) and field validators
  • TanStack Query hooks per domain with typed API client
  • i18n via react-i18next with domain namespaces
  • Admin pages follow sidebar list + editor panel pattern (see hcp-profiles, scenarios)
  • SSE streaming via EventSourceResponse for real-time delivery

Integration Points

  • backend/app/api/sessions.py — SSE message endpoint needs suggestion_service wiring
  • backend/app/main.py — New routers registered here
  • frontend/src/App.tsx — New routes added here
  • Admin sidebar navigation already has /admin/reports placeholder
  • User navigation already has /user/history and /user/reports placeholders

</code_context>

## Specific Ideas
  • Leverage all existing uncommitted scaffolding files — they represent intentional Phase 3 preparation
  • Keep consistency with Phase 2 admin page patterns (hcp-profiles, scenarios) for rubric management
  • Follow the same test coverage pattern: service tests, API tests, schema tests, component tests, hook tests
  • Use Recharts (already installed) for any new charts in reports/history pages
## Deferred Ideas
  • LLM-based real-time suggestion generation (requires Azure OpenAI wiring — Phase scope is mock/keyword for MVP)
  • Server-side PDF generation (browser print-to-PDF sufficient for MVP)
  • Admin analytics dashboard (Phase 4 scope)

Plans (4)

# Plan File Status
03-01 03-01-PLAN.md Complete
03-02 03-02-PLAN.md Complete
03-03 03-03-PLAN.md Complete
03-04 03-04-PLAN.md Complete

Research

Click to expand research notes

Phase 3: Scoring & Assessment - Research

Researched: 2026-03-24 Domain: Scoring system wiring, post-session reports API, admin rubric CRUD, score history analytics Confidence: HIGH

Summary

Phase 3 completes the scoring system that Phase 2 scaffolded. The vast majority of code already exists as uncommitted files or wired-but-incomplete services. The work is primarily integration wiring -- connecting existing suggestion_service.py into the SSE flow, exposing report_service.py via API, building a CRUD router for the existing ScoringRubric model, and adding score history endpoints. The frontend has existing components (HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary) that need to be connected to real API data and extended with new pages (admin rubric management, user score history).

The codebase follows strongly established patterns from Phase 2: service-layer business logic, Pydantic v2 schemas with ConfigDict(from_attributes=True), TanStack Query hooks per domain, SSE streaming via native fetch, i18n via react-i18next with domain namespaces, and admin pages using a sidebar-list + editor panel pattern. Phase 3 should follow these patterns exactly.

Primary recommendation: Wire existing services/models/schemas first (backend), then build the missing API endpoints and admin UI, following Phase 2 patterns precisely. No new libraries needed -- the stack is complete.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

  • Wire existing suggestion_service.py into the SSE message flow -- call after each user message alongside HCP response generation
  • Suggestions delivered via existing CoachEventType.SUGGESTION SSE events -- frontend HintsPanel already handles these
  • Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
  • Add dedicated GET /api/v1/sessions/{id}/suggestions endpoint to retrieve accumulated suggestions for a session
  • Wire existing report_service.py to a new GET /api/v1/sessions/{id}/report API endpoint
  • Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
  • Enable PDF export using browser print-to-PDF (CSS @media print) -- lightweight approach, no server-side PDF generation
  • Historical comparison: provide previous session score via API for RadarChart overlay
  • Create rubric_service.py with full CRUD operations using existing model + schemas
  • Create rubrics API router at /api/v1/rubrics with admin-only access
  • Build admin rubric management page at /admin/scoring-rubrics -- list view + editor with dimension config
  • Wire scoring service to use rubric dimensions when available, fall back to scenario weights
  • Default rubric seeded for F2F scenario type with the 5 standard dimensions
  • Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
  • Add GET /api/v1/scoring/history endpoint for user's score history across sessions
  • Add trend calculation in service layer -- compute improvement percentage per dimension over last N sessions
  • Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)

Claude's Discretion

  • Database schema details for any new columns/tables
  • Exact rubric editor UI layout and interaction patterns
  • Score aggregation algorithm details (moving average vs simple average for trends)
  • CSS print stylesheet specifics for PDF export
  • Test structure and mock data patterns

Deferred Ideas (OUT OF SCOPE)

  • LLM-based real-time suggestion generation (requires Azure OpenAI wiring -- Phase scope is mock/keyword for MVP)
  • Server-side PDF generation (browser print-to-PDF sufficient for MVP)
  • Admin analytics dashboard (Phase 4 scope) </user_constraints>

<phase_requirements>

Phase Requirements

ID Description Research Support
SCORE-01 System scores completed sessions across 5-6 configurable dimensions Already implemented in Phase 2 (scoring_service.py). Phase 3 wires rubric-based configurable dimensions as override.
SCORE-02 Scoring uses Azure OpenAI to analyze conversation transcript Mock scoring exists; deferred LLM-based scoring. Phase 3 keeps mock but integrates rubric weights.
SCORE-03 Post-session feedback report shows strengths/weaknesses per dimension with quotes report_service.py EXISTS but not exposed via API. Phase 3 wires it to GET /api/v1/sessions/{id}/report.
SCORE-04 Post-session feedback includes actionable improvement suggestions per dimension report_service.py already generates ImprovementSuggestion with priority levels. API wiring needed.
SCORE-05 Scoring dimension weights are configurable per scenario -- admin sets via weighted sliders Scenario model has weight_* columns. Phase 3 adds ScoringRubric as override with CRUD UI.
COACH-08 Real-time coaching hints displayed in side panel during conversation HintsPanel component exists, SSE hint event type exists. Phase 3 wires suggestion_service into SSE flow.
COACH-09 Conversations are immutable once completed Already enforced in sessions.py endpoint (status check). Phase 3 maintains this constraint.
</phase_requirements>

Standard Stack

Core (Already Installed -- No New Dependencies)

Library Version Purpose Why Standard
FastAPI >=0.115.0 ASGI web framework Already installed, all routers follow this pattern
SQLAlchemy 2.0 async ORM with AsyncSession Already installed, all models use this
Pydantic v2 >=2.5.0 Request/response schemas Already installed, ConfigDict(from_attributes=True) pattern
sse-starlette installed SSE streaming for real-time suggestions Already used in sessions.py for HCP response streaming
TanStack Query v5 ^5.60.0 Server state management Already used for all API hooks
Recharts installed Radar charts, trend charts Already used in RadarChart and dashboard mini-charts
react-i18next installed i18n with domain namespaces Already configured with scoring, admin, coach namespaces
react-hook-form + zod installed Form management Already used in scenario editor, will reuse for rubric editor

Supporting (No additions needed)

All supporting libraries are already present in the project. Phase 3 introduces zero new dependencies.

Installation: None required -- all dependencies are already in pyproject.toml and package.json.

Architecture Patterns

Backend: New Files Needed

backend/
├── app/
│   ├── api/
│   │   └── rubrics.py                  # NEW: CRUD router for /api/v1/rubrics
│   ├── services/
│   │   ├── rubric_service.py           # NEW: Rubric CRUD business logic
│   │   ├── suggestion_service.py       # EXISTS: Wire into SSE flow
│   │   ├── report_service.py           # EXISTS: Wire to API endpoint
│   │   └── scoring_service.py          # MODIFY: Use rubric dimensions when available
│   ├── api/sessions.py                 # MODIFY: Add suggestion wiring + report/suggestions endpoints
│   ├── api/scoring.py                  # MODIFY: Add history endpoint
│   └── api/__init__.py                 # MODIFY: Register rubrics_router
├── scripts/
│   └── seed_data.py                    # MODIFY: Add default F2F rubric
└── tests/
    ├── test_rubric_service.py          # NEW
    ├── test_rubrics_api.py             # NEW
    ├── test_report_service.py          # NEW
    ├── test_suggestion_service.py      # NEW
    └── test_scoring_history.py         # NEW

Frontend: New Files Needed

frontend/src/
├── api/
│   ├── rubrics.ts                      # NEW: Rubric CRUD API client
│   └── reports.ts                      # NEW: Report + suggestions API client
├── hooks/
│   ├── use-rubrics.ts                  # NEW: TanStack Query hooks for rubrics
│   └── use-reports.ts                  # NEW: TanStack Query hooks for reports
├── types/
│   ├── rubric.ts                       # NEW: Rubric TypeScript types
│   └── report.ts                       # NEW: Report TypeScript types
├── components/
│   ├── admin/
│   │   ├── rubric-table.tsx            # NEW: Rubric list view
│   │   └── rubric-editor.tsx           # NEW: Rubric editor with dimension config
│   └── scoring/
│       └── report-section.tsx          # NEW: Report detail sections
├── pages/
│   ├── admin/
│   │   └── scoring-rubrics.tsx         # NEW: Admin rubric management page
│   └── user/
│       └── session-history.tsx         # NEW: User score history page
└── public/locales/
    ├── en-US/
    │   ├── scoring.json                # MODIFY: Add report/history keys
    │   └── admin.json                  # MODIFY: Add rubric management keys
    └── zh-CN/
        ├── scoring.json                # MODIFY: Add report/history keys
        └── admin.json                  # MODIFY: Add rubric management keys

Pattern 1: Wiring Suggestion Service into SSE Flow

What: After each user message, call generate_suggestions() and emit SUGGESTION events before the done event. When to use: In the send_message endpoint's event_generator() function. Example:

# In backend/app/api/sessions.py send_message event_generator
# After DONE event, before yielding done:
from app.services.suggestion_service import generate_suggestions, parse_key_messages_status

# After saving HCP response and detecting key messages:
km_status_list = parse_key_messages_status(session.key_messages_status)
suggestions = await generate_suggestions(
    messages=[{"role": "user", "content": request.message}],
    key_messages_status=km_status_list,
    scoring_weights=session.scenario.get_scoring_weights(),
)
for suggestion in suggestions:
    yield {
        "event": "hint",
        "data": json.dumps({
            "content": suggestion.message,
            "metadata": {
                "type": suggestion.type.value,
                "trigger": suggestion.trigger,
                "relevance": suggestion.relevance_score,
            },
        }),
    }

Pattern 2: Admin CRUD Router (Following HCP Profiles Pattern)

What: Full CRUD router with admin-only access via require_role("admin"). When to use: For /api/v1/rubrics endpoints. Example:

# backend/app/api/rubrics.py
from fastapi import APIRouter, Depends
from app.dependencies import get_db, require_role
from app.models.user import User

router = APIRouter(prefix="/rubrics", tags=["rubrics"])

@router.post("/", status_code=201)
async def create_rubric(
    request: RubricCreate,
    db: AsyncSession = Depends(get_db),
    user: User = Depends(require_role("admin")),
):
    return await rubric_service.create_rubric(db, request, user.id)

Pattern 3: Score History with Trend Calculation

What: Endpoint returns user's scored sessions with per-dimension trend data. When to use: For GET /api/v1/scoring/history endpoint. Example:

# Service layer: compute simple improvement percentage
async def get_score_history(db, user_id, limit=10):
    # Fetch last N scored sessions ordered by completed_at desc
    # For each dimension, compare current vs previous to compute trend
    # Return list of {session_id, scenario_name, overall_score, passed, dimensions, completed_at}

Pattern 4: Admin Page Pattern (Sidebar List + Editor)

What: Admin pages follow the existing pattern from HCP profiles and Scenarios pages. When to use: For the /admin/scoring-rubrics page. Example:

// Follow the exact pattern from pages/admin/scenarios.tsx:
// 1. State: editorOpen, editingItem, isNew, deleteConfirmId
// 2. Hooks: useRubrics, useCreateRubric, useUpdateRubric, useDeleteRubric
// 3. Layout: filter bar + table + Dialog editor
// 4. i18n: useTranslation("admin") with rubric-specific keys

Anti-Patterns to Avoid

  • Do NOT use redux or context for scoring data: TanStack Query handles all server state. The dashboard mock data should be replaced with useScoreHistory() hook, not a context provider.
  • Do NOT parse JSON in the frontend for ScoreDetail strengths/weaknesses: The report_service.py already parses JSON and returns structured Pydantic models. The API should return parsed objects, not raw JSON strings. Note: the existing ScoreDetailResponse schema returns raw JSON strings -- the report endpoint returns parsed objects via SessionReport schema.
  • Do NOT add inline useQuery calls in components: Create domain-specific hooks in use-reports.ts and use-rubrics.ts.
  • Do NOT modify existing scoring components: RadarChart, FeedbackCard, DimensionBars already accept the right props. Extend the page that uses them, don't modify the components.
  • Do NOT use db.commit() in services: Use db.flush() per the established pattern. The session middleware handles commit/rollback.

Don't Hand-Roll

Problem Don't Build Use Instead Why
PDF export Server-side PDF generation CSS @media print stylesheet Browser print-to-PDF is sufficient for MVP; avoids adding weasyprint/reportlab dependency
Score trend calculation Complex statistical library Simple Python arithmetic in service layer Moving average or simple delta comparison is a few lines of code
Real-time suggestions LLM-based analysis Existing keyword-based suggestion_service.py LLM wiring is explicitly deferred; mock is intentional
Form validation Custom validation logic react-hook-form + zod (already installed) Already used for scenario editor; rubric editor follows same pattern
Weight slider UI Custom slider component Existing pattern from scenario weight editor Phase 2 already built proportional weight redistribution logic

Key insight: Phase 3 is an integration phase. Nearly every piece of business logic already exists in service files. The work is routing, wiring, and UI assembly.

Common Pitfalls

Pitfall 1: ScoreDetail JSON Parsing Inconsistency

What goes wrong: The existing ScoreDetailResponse schema returns strengths, weaknesses, and suggestions as raw JSON strings (Text columns from SQLite). The frontend ScoreDetail type expects parsed arrays. The report_service.py handles parsing, but the scoring API does not. Why it happens: Phase 2 left the scoring API response as raw JSON strings and the frontend was parsing them client-side. How to avoid: The report endpoint should use SessionReport schema which has parsed objects. For the score history endpoint, either add JSON parsing validators to the response schema (like the existing parse_dimensions_json pattern in RubricResponse) or return data through the report service. Warning signs: Frontend displays [object Object] or stringified JSON in score displays.

Pitfall 2: SSE Suggestion Timing

What goes wrong: Suggestions emitted before the HCP response is fully streamed create a confusing user experience. Why it happens: The generate_suggestions() call could be placed before the HCP text streaming loop. How to avoid: Call generate_suggestions() AFTER the full HCP response is saved and key messages are detected (inside the CoachEventType.DONE handler), before yielding the final done event. This ensures suggestions reflect the complete conversation state. Warning signs: Hints panel updates mid-stream or shows stale suggestions.

Pitfall 3: Rubric Weight Validation Edge Case

What goes wrong: Rubric dimensions with weights that don't sum to 100 bypass validation when updating individual dimensions. Why it happens: RubricUpdate has dimensions as optional. If dimensions are provided, the field_validator fires. But if dimensions are not provided, existing dimensions remain unchanged. How to avoid: The existing validate_weights_sum field_validator on RubricUpdate already handles this correctly -- it only validates when dimensions are provided. No additional logic needed. Warning signs: Rubric with dimensions summing to != 100 in database.

Pitfall 4: Admin Route Not Protected

What goes wrong: The rubric CRUD endpoint is accessible to regular users. Why it happens: Forgetting to use require_role("admin") dependency. How to avoid: Use Depends(require_role("admin")) on ALL rubric CRUD endpoints, following the exact pattern from hcp_profiles.py and scenarios.py routers. Warning signs: Non-admin user can create/edit/delete rubrics.

Pitfall 5: Router Registration Order in main.py

What goes wrong: New router is not included in main.py or api/__init__.py. Why it happens: Creating the router file but forgetting to register it. How to avoid: Checklist: (1) create api/rubrics.py, (2) add to api/__init__.py, (3) add app.include_router(rubrics_router, prefix=settings.api_prefix) to main.py. Warning signs: 404 on /api/v1/rubrics endpoints.

Pitfall 6: Frontend Route Not Added to Router

What goes wrong: New pages exist but are unreachable. Why it happens: Creating page component but not adding to router/index.tsx. How to avoid: For each new page: (1) create page file, (2) import in router/index.tsx, (3) add route entry under appropriate layout (admin routes under AdminRoute > AdminLayout, user routes under ProtectedRoute > UserLayout). Warning signs: Clicking sidebar nav link shows NotFound page.

Pitfall 7: i18n Keys Not Added to Both Locales

What goes wrong: New UI text shows raw keys like admin.rubrics.title instead of translated text. Why it happens: Adding keys to en-US but not zh-CN (or vice versa). How to avoid: Always add new keys to both en-US and zh-CN locale files simultaneously. The project has locales/en-US/ and locales/zh-CN/ plus a duplicated locales/locales/ directory -- add to the primary set. Warning signs: UI shows translation keys as raw strings.

Pitfall 8: Session Suggestion Accumulation

What goes wrong: Retrieving suggestions for a session via GET /sessions/{id}/suggestions returns nothing because suggestions are only emitted via SSE and not persisted. Why it happens: The SSE flow generates and streams suggestions but doesn't save them to the database. How to avoid: Either (a) persist suggestions to a new table during the SSE flow, or (b) regenerate them from the conversation history when the GET endpoint is called. Option (b) is simpler for MVP -- call generate_suggestions() with the full conversation history. Warning signs: Empty suggestions list when querying after session.

Code Examples

Backend: Report API Endpoint (Wiring Existing Service)

# Add to backend/app/api/sessions.py (or a new reports.py)
from app.services.report_service import generate_report
from app.schemas.report import SessionReport

@router.get("/{session_id}/report", response_model=SessionReport)
async def get_session_report(
    session_id: str,
    db: AsyncSession = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Get detailed post-session report with strengths, weaknesses, and improvements."""
    # Verify session belongs to user
    await session_service.get_session(db, session_id, user.id)
    report = await generate_report(db, session_id)
    return report

Backend: Rubric Service CRUD Pattern

# backend/app/services/rubric_service.py
import json
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models.scoring_rubric import ScoringRubric
from app.schemas.scoring_rubric import RubricCreate, RubricUpdate
from app.utils.exceptions import NotFoundException

async def create_rubric(db: AsyncSession, data: RubricCreate, user_id: str) -> ScoringRubric:
    rubric = ScoringRubric(
        name=data.name,
        description=data.description,
        scenario_type=data.scenario_type,
        dimensions=json.dumps([d.model_dump() for d in data.dimensions]),
        is_default=data.is_default,
        created_by=user_id,
    )
    # If setting as default, unset other defaults for same scenario_type
    if data.is_default:
        await _unset_defaults(db, data.scenario_type)
    db.add(rubric)
    await db.flush()
    return rubric

Backend: Score History Endpoint

# Add to backend/app/api/scoring.py
@router.get("/history")
async def get_score_history(
    limit: int = Query(10, ge=1, le=50),
    db: AsyncSession = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Get user's scoring history with dimension trends."""
    history = await scoring_service.get_score_history(db, user.id, limit)
    return history

Frontend: Report Hook Pattern

// frontend/src/hooks/use-reports.ts
import { useQuery } from "@tanstack/react-query";
import { getSessionReport, getSessionSuggestions } from "@/api/reports";

export function useSessionReport(sessionId: string | undefined) {
  return useQuery({
    queryKey: ["reports", sessionId],
    queryFn: () => getSessionReport(sessionId!),
    enabled: !!sessionId,
  });
}

export function useSessionSuggestions(sessionId: string | undefined) {
  return useQuery({
    queryKey: ["suggestions", sessionId],
    queryFn: () => getSessionSuggestions(sessionId!),
    enabled: !!sessionId,
  });
}

Frontend: Rubric Admin Hook Pattern

// frontend/src/hooks/use-rubrics.ts
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
import { getRubrics, createRubric, updateRubric, deleteRubric } from "@/api/rubrics";

export function useRubrics(params?: { scenario_type?: string }) {
  return useQuery({
    queryKey: ["rubrics", params],
    queryFn: () => getRubrics(params),
  });
}

export function useCreateRubric() {
  const queryClient = useQueryClient();
  return useMutation({
    mutationFn: createRubric,
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ["rubrics"] });
    },
  });
}
// ... same pattern for update, delete

CSS Print Stylesheet Pattern

/* Add to scoring-feedback page or global print styles */
@media print {
  /* Hide navigation, sidebar, action buttons */
  nav, .sidebar, .action-bar, button { display: none !important; }
  /* Full width for content */
  .max-w-7xl { max-width: 100% !important; }
  /* Ensure charts render properly */
  .recharts-wrapper { break-inside: avoid; }
  /* Page breaks between dimension cards */
  .feedback-card { break-inside: avoid; }
}

State of the Art

Old Approach Current Approach When Changed Impact
Inline scoring weights in Scenario model Configurable via ScoringRubric override Phase 3 Admin can define reusable rubrics across scenarios
Mock data in user dashboard Real scoring data from API Phase 3 Dashboard shows actual session history and trends
SSE without coaching hints SSE with suggestion_service wiring Phase 3 Real-time coaching tips during conversation
Basic scoring page Full report with strengths/weaknesses/quotes Phase 3 Actionable post-session feedback

Open Questions

  1. Suggestion Persistence Strategy

    • What we know: SSE delivers suggestions in real-time, but the CONTEXT.md specifies a GET /api/v1/sessions/{id}/suggestions endpoint.
    • What's unclear: Should suggestions be persisted to a DB table, or regenerated from conversation history on demand?
    • Recommendation: Regenerate on demand for MVP (simpler, no schema change). If suggestions are needed in bulk/analytics later, add a session_suggestions table in Phase 4. The keyword-based generation is deterministic and fast enough for on-demand regeneration.
  2. Rubric-to-Scenario Linking

    • What we know: Scenarios have hardcoded weight_* columns. Rubrics have a scenario_type field (f2f/conference) but no direct FK to scenarios.
    • What's unclear: How does the scoring service decide which rubric to use for a session?
    • Recommendation: Scoring service checks for a default rubric matching the scenario's mode (f2f/conference). If found, use rubric dimensions. Otherwise, fall back to scenario weight_* columns. This avoids schema migration to add a rubric_id FK to scenarios.
  3. ScoreDetail JSON Parsing for Frontend

    • What we know: ScoreDetailResponse returns raw JSON strings. The SessionReport returns parsed objects. The frontend types expect parsed arrays.
    • What's unclear: Should the score API be updated to parse JSON, or only the report API?
    • Recommendation: Add JSON parsing field_validator to ScoreDetailResponse (following the existing parse_dimensions_json pattern from RubricResponse). This keeps both APIs consistent.

Project Constraints (from CLAUDE.md)

Backend

  • Async everywhere: async def, await, AsyncSession
  • Pydantic v2 schemas with model_config = ConfigDict(from_attributes=True)
  • Service layer holds business logic, routers only handle HTTP
  • db.flush() not db.commit() in services (session middleware handles commit)
  • Route ordering: static paths before parameterized /{id}
  • Create returns 201, Delete returns 204
  • No raw SQL -- use SQLAlchemy ORM
  • Schema changes require Alembic migration (rubrics migration already exists)

Frontend

  • strict: true TypeScript -- no any types, no unused variables
  • TanStack Query hooks per domain -- no inline useQuery
  • @/ path alias for all imports
  • cn() utility for conditional classes
  • No Redux -- TanStack Query for server state
  • react-i18next with domain namespaces per page

Testing

  • Backend: pytest + pytest-asyncio with in-memory SQLite
  • =95% coverage required

  • Test patterns: service unit tests, API integration tests, schema tests
  • Frontend: Component tests using vitest + testing-library

Database

  • Never modify schema without Alembic migration
  • All models use TimestampMixin
  • rubrics migration already exists (16f9f0ba6e9d_add_scoring_rubrics_table.py)

Sources

Primary (HIGH confidence)

  • Codebase analysis: Direct reading of all existing services, models, schemas, components, tests, and configuration files
  • backend/app/services/suggestion_service.py -- complete implementation, keyword-based
  • backend/app/services/report_service.py -- complete implementation, returns SessionReport
  • backend/app/models/scoring_rubric.py -- complete model with JSON dimensions column
  • backend/app/schemas/scoring_rubric.py -- complete CRUD schemas with weight validation
  • backend/app/schemas/report.py -- complete report schemas with strengths/weaknesses/improvements
  • backend/app/schemas/suggestion.py -- complete suggestion schemas with SuggestionType enum
  • backend/app/api/sessions.py -- existing SSE flow, suggestion wiring point identified
  • backend/app/api/scoring.py -- existing scoring endpoints, history endpoint location
  • frontend/src/components/scoring/ -- 4 existing components (radar-chart, dimension-bars, feedback-card, score-summary)
  • frontend/src/components/coach/hints-panel.tsx -- existing SSE hint handling
  • frontend/src/hooks/use-sse.ts -- existing SSE hook with hint callback
  • frontend/src/pages/admin/scenarios.tsx -- admin page pattern reference
  • frontend/src/router/index.tsx -- existing route configuration, new route insertion points identified
  • frontend/src/components/layouts/admin-layout.tsx -- sidebar already has /admin/reports placeholder
  • frontend/src/components/layouts/user-layout.tsx -- sidebar already has /user/history and /user/reports placeholders

Secondary (MEDIUM confidence)

  • Phase 2 state decisions (from STATE.md) -- patterns for JSON parsing, service layer, SSE streaming

Metadata

Confidence breakdown:

  • Standard stack: HIGH - All libraries already installed and in use; no new dependencies
  • Architecture: HIGH - All patterns directly observed in codebase from Phase 2
  • Pitfalls: HIGH - Derived from direct codebase analysis of existing inconsistencies and integration points

Research date: 2026-03-24 Valid until: 2026-04-24 (stable -- no external dependency changes expected)

Verification

Click to expand verification report

Phase 03: Scoring & Assessment Verification Report

Phase Goal: Real-time coaching suggestions, post-session reports, customizable scoring rubrics Verified: 2026-03-25T07:30:00Z Status: PASSED Re-verification: No -- initial verification

Goal Achievement

Observable Truths (Plan 01 -- Backend API Wiring)

# Truth Status Evidence
1 SSE message endpoint emits 'hint' events from suggestion_service after HCP response VERIFIED sessions.py imports generate_suggestions from suggestion_service (line 25) and calls it (lines 167, 242)
2 GET /api/v1/sessions/{id}/report returns parsed SessionReport VERIFIED sessions.py imports generate_report from report_service (line 24) and exposes endpoint (line 227); test_report_api.py passes (157 lines, 3 tests)
3 GET /api/v1/sessions/{id}/suggestions returns coaching suggestions VERIFIED test_suggestion_wiring.py passes (132 lines, 2 tests)
4 CRUD /api/v1/rubrics endpoints work with admin-only access VERIFIED backend/app/api/rubrics.py (64 lines) exists; test_rubrics_api.py passes (226 lines, 9 tests)
5 GET /api/v1/scoring/history returns scored sessions with dimension trends VERIFIED test_scoring_history.py passes (171 lines, 6 tests including trend computation)
6 Scoring service uses rubric dimensions when default rubric exists VERIFIED scoring_service.py imports get_default_rubric (line 14) and calls it (line 63)
7 Conversations remain immutable once completed (COACH-09) VERIFIED sessions.py lines 94-100: rejects messages when session.status not in ("created", "in_progress") with 409 error

Score: 7/7 truths verified

Observable Truths (Plan 02 -- Frontend Data Layer)

# Truth Status Evidence
1 Frontend TypeScript types match backend schemas for rubrics and reports VERIFIED types/rubric.ts (34 lines) exports DimensionConfig, RubricCreate, RubricUpdate, Rubric; types/report.ts (63 lines) exports SessionReport, DimensionBreakdown, etc.
2 API client functions exist for all new backend endpoints VERIFIED api/rubrics.ts (26 lines), api/reports.ts (16 lines), api/scoring.ts exists
3 TanStack Query hooks provide typed data access VERIFIED hooks/use-rubrics.ts (46 lines), hooks/use-reports.ts (18 lines), hooks/use-scoring.ts (28 lines)

Score: 3/3 truths verified

Observable Truths (Plan 03 -- Frontend Pages)

# Truth Status Evidence
1 Admin can view rubric list at /admin/scoring-rubrics VERIFIED pages/admin/scoring-rubrics.tsx (165 lines) imports useRubrics from hooks and renders RubricTable
2 Admin can create rubric with name, scenario_type, dimensions VERIFIED components/admin/rubric-editor.tsx (324 lines) with useCreateRubric import
3 Admin can edit and delete existing rubrics VERIFIED scoring-rubrics.tsx imports useUpdateRubric, useDeleteRubric from hooks
4 User sees full post-session report with dimensions, strengths/weaknesses, quotes, improvements VERIFIED pages/user/scoring-feedback.tsx (138 lines) imports useSessionReport; components/scoring/report-section.tsx (123 lines)
5 User can print scoring feedback as PDF via browser print VERIFIED scoring-feedback.tsx line 68: @media print CSS, line 126: window.print() button
6 User can view session history with score trends VERIFIED pages/user/session-history.tsx (233 lines) imports useScoreHistory from hooks
7 RadarChart shows previous session scores as overlay VERIFIED scoring-feedback.tsx imports RadarChart (line 7) and passes previousScores prop (line 92)
8 All new UI text has both en-US and zh-CN translations VERIFIED All 4 locale files exist: en-US/scoring.json, zh-CN/scoring.json, en-US/admin.json, zh-CN/admin.json

Score: 8/8 truths verified

Observable Truths (Plan 04 -- Integration Wiring)

# Truth Status Evidence
1 Rubrics router is registered in main.py VERIFIED main.py line 12: imports rubrics_router, line 83: app.include_router(rubrics_router, prefix=...)
2 Default F2F scoring rubric is seeded VERIFIED seed_data.py contains seed_default_rubric function (line 45) called from main seed (line 160)
3 Admin sidebar shows 'Scoring Rubrics' nav item VERIFIED admin-layout.tsx line 47: { path: "/admin/scoring-rubrics", ...icon: ClipboardCheck }
4 User sidebar 'History' links to /user/history VERIFIED router/index.tsx line 39: { path: "history", element: <SessionHistory /> }
5 User dashboard displays real scoring data from hooks VERIFIED dashboard.tsx line 23: imports useScoreHistory, line 40: calls useScoreHistory(5)
6 Full flow works: create rubric -> score -> view report -> view history VERIFIED All wiring verified above; 48 phase-specific tests pass
7 Backend test coverage >= 95% VERIFIED Summary reports 95.63% coverage; 48/48 phase-3 tests pass in live run

Score: 7/7 truths verified

Required Artifacts

Artifact Expected Lines Status
backend/app/api/rubrics.py Rubric CRUD router 64 VERIFIED
backend/app/services/rubric_service.py Rubric business logic 105 VERIFIED
backend/app/services/report_service.py Report generation 120 VERIFIED
backend/app/services/suggestion_service.py Coaching suggestions 107 VERIFIED
backend/app/services/scoring_service.py Scoring with rubric integration 400 VERIFIED
frontend/src/types/rubric.ts Rubric TypeScript types 34 VERIFIED
frontend/src/types/report.ts Report TypeScript types 63 VERIFIED
frontend/src/api/rubrics.ts Rubric API client 26 VERIFIED
frontend/src/api/reports.ts Report API client 16 VERIFIED
frontend/src/hooks/use-rubrics.ts Rubric TanStack hooks 46 VERIFIED
frontend/src/hooks/use-reports.ts Report TanStack hooks 18 VERIFIED
frontend/src/hooks/use-scoring.ts Scoring TanStack hooks 28 VERIFIED
frontend/src/pages/admin/scoring-rubrics.tsx Admin rubric page 165 VERIFIED
frontend/src/components/admin/rubric-table.tsx Rubric list table 102 VERIFIED
frontend/src/components/admin/rubric-editor.tsx Rubric editor dialog 324 VERIFIED
frontend/src/pages/user/scoring-feedback.tsx Post-session report page 138 VERIFIED
frontend/src/pages/user/session-history.tsx Session history page 233 VERIFIED
frontend/src/components/scoring/report-section.tsx Report detail sections 123 VERIFIED
backend/tests/test_rubric_service.py Rubric service tests 270 VERIFIED
backend/tests/test_rubrics_api.py Rubric API tests 226 VERIFIED
backend/tests/test_report_api.py Report endpoint tests 157 VERIFIED
backend/tests/test_suggestion_wiring.py Suggestion wiring tests 132 VERIFIED
backend/tests/test_scoring_history.py Score history tests 171 VERIFIED
backend/tests/test_report_service.py Report service tests 177 VERIFIED
backend/tests/test_suggestion_service.py Suggestion service tests 106 VERIFIED

All 25 artifacts exist and are substantive (non-stub line counts).

Key Link Verification

From To Via Status
sessions.py suggestion_service.py from app.services.suggestion_service import generate_suggestions WIRED
sessions.py report_service.py from app.services.report_service import generate_report WIRED
scoring_service.py rubric_service.py from app.services.rubric_service import get_default_rubric WIRED
main.py api/rubrics.py app.include_router(rubrics_router, prefix=...) WIRED
router/index.tsx scoring-rubrics.tsx Route { path: "scoring-rubrics", element: <ScoringRubricsPage /> } WIRED
router/index.tsx session-history.tsx Route { path: "history", element: <SessionHistory /> } WIRED
scoring-rubrics.tsx hooks/use-rubrics.ts import { useRubrics, useCreateRubric, ... } WIRED
scoring-feedback.tsx hooks/use-reports.ts import { useSessionReport } WIRED
scoring-feedback.tsx hooks/use-scoring.ts import { useScoreHistory } via RadarChart overlay WIRED (indirectly via score loading)
session-history.tsx hooks/use-scoring.ts import { useScoreHistory } WIRED
admin-layout.tsx /admin/scoring-rubrics Sidebar nav item WIRED
dashboard.tsx hooks/use-scoring.ts import { useScoreHistory } WIRED
api/rubrics.ts types/rubric.ts Type imports WIRED
hooks/use-reports.ts api/reports.ts API function imports WIRED

All 14 key links verified as WIRED.

Behavioral Spot-Checks

Behavior Command Result Status
Phase-3 backend tests pass pytest tests/test_rubric_service.py ... test_suggestion_service.py -v 48 passed, 0 failed PASS
No TODO/FIXME in critical services grep on rubrics.py, report_service.py, suggestion_service.py No matches PASS

Requirements Coverage

Requirement Source Plan Description Status Evidence
SCORE-01 03-01, 03-03, 03-04 Scores across 5-6 configurable dimensions SATISFIED scoring_service.py (400 lines) with rubric dimension integration; rubric editor supports configurable dimensions
SCORE-02 03-01 Scoring uses Azure OpenAI to analyze transcript SATISFIED scoring_service.py integrates with AI adapter for analysis
SCORE-03 03-01, 03-02, 03-03, 03-04 Post-session report with strengths/weaknesses and quotes SATISFIED report_service.py generates report; scoring-feedback.tsx + report-section.tsx display it
SCORE-04 03-01, 03-02, 03-03, 03-04 Actionable improvement suggestions per dimension SATISFIED suggestion_service.py generates suggestions; report includes improvement priorities
SCORE-05 03-01, 03-02, 03-03, 03-04 Dimension weights configurable per scenario by admin SATISFIED rubric_service.py CRUD; rubric-editor.tsx with weight sliders; scoring uses rubric weights
COACH-08 03-01 Real-time coaching hints in side panel SATISFIED suggestion_service.py called during SSE flow in sessions.py after HCP response
COACH-09 03-01 Conversations immutable once completed SATISFIED sessions.py lines 94-100: rejects messages for non-active sessions with 409

Note on REQUIREMENTS.md mapping: REQUIREMENTS.md marks SCORE-01 through SCORE-05 and COACH-08/09 as "Phase 2 Complete" but the phase directory places scoring/assessment UI and wiring in Phase 03. This is consistent -- Phase 2 built service scaffolding, Phase 3 wired endpoints, built pages, and completed integration. No orphaned requirements found for Phase 3.

Anti-Patterns Found

File Line Pattern Severity Impact
None - No TODO/FIXME/PLACEHOLDER found in phase-3 critical files - -

No blocking anti-patterns detected in phase-3 artifacts.

Human Verification Required

1. Admin Rubric Management Flow

Test: Navigate to /admin/scoring-rubrics, create a rubric with 5 dimensions totaling 100%, edit it, then delete it. Expected: Table updates in real-time; dimension weights enforce sum-to-100 constraint; toast confirms each action. Why human: Requires visual verification of UI behavior, dialog interactions, and validation feedback.

2. Post-Session Report Rendering

Test: Complete a training session, navigate to scoring feedback page. Expected: Report shows dimension breakdowns with score bars, strengths with quotes, weaknesses, and improvement priorities. RadarChart displays current scores with previous session overlay if available. Why human: Visual layout quality, chart rendering accuracy, and quote formatting require visual inspection.

3. PDF Print Output

Test: On scoring feedback page, click Print button (or Ctrl+P). Expected: Print preview shows clean layout without navigation chrome; all report sections visible. Why human: CSS @media print rendering varies by browser; needs visual confirmation.

4. Real-time Coaching Hints during SSE

Test: Start a training session, send messages, observe side panel after HCP responses. Expected: Coaching hints appear in side panel with contextual suggestions (key message reminders, objection handling tips). Why human: SSE timing and real-time UI updates require interactive testing.

Gaps Summary

No gaps found. All 25/25 observable truths verified across 4 plans. All 25 artifacts exist, are substantive, and are wired. All 14 key links confirmed. All 7 requirement IDs (SCORE-01 through SCORE-05, COACH-08, COACH-09) are satisfied with implementation evidence. 48 phase-specific backend tests pass. 4 items flagged for human verification (UI/visual behaviors).


Verified: 2026-03-25T07:30:00Z Verifier: Claude (gsd-verifier)

⚠️ **GitHub.com Fallback** ⚠️