Phase 03: Scoring Assessment

Auto-generated from .planning/phases/03-scoring-assessment
Last synced: 2026-04-02

Context & Decisions

Phase 3: Scoring & Assessment - Context

Gathered: 2026-03-24 Status: Ready for planning Mode: Auto-generated (full user authorization — Claude's discretion on all decisions)

## Phase Boundary

Complete the scoring system with: (1) real-time coaching suggestions wired into the SSE session flow, (2) detailed post-session reports with strengths/weaknesses/improvement areas exposed via API, (3) admin-customizable scoring rubrics with full CRUD, and (4) historical scoring data persistence and query APIs for trend analysis. Phase 2 built the scoring foundation (mock scores, 5 dimensions, basic feedback page). Phase 3 finishes the job — wiring the existing scaffolding, adding missing APIs, and building the admin rubric management UI.

## Implementation Decisions

Real-time Coaching Suggestions

Wire existing suggestion_service.py into the SSE message flow — call after each user message alongside HCP response generation
Suggestions delivered via existing CoachEventType.SUGGESTION SSE events — frontend HintsPanel already handles these
Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
Add dedicated GET /api/v1/sessions/{id}/suggestions endpoint to retrieve accumulated suggestions for a session

Post-session Reports

Wire existing report_service.py to a new GET /api/v1/sessions/{id}/report API endpoint
Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
Enable PDF export using browser print-to-PDF (CSS @media print) — lightweight approach, no server-side PDF generation
Historical comparison: provide previous session score via API for RadarChart overlay

Admin Scoring Rubrics

Create rubric_service.py with full CRUD operations using existing model + schemas
Create rubrics API router at /api/v1/rubrics with admin-only access
Build admin rubric management page at /admin/scoring-rubrics — list view + editor with dimension config
Wire scoring service to use rubric dimensions when available, fall back to scenario weights
Default rubric seeded for F2F scenario type with the 5 standard dimensions

Score Persistence & Analytics

Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
Add GET /api/v1/scoring/history endpoint for user's score history across sessions
Add trend calculation in service layer — compute improvement percentage per dimension over last N sessions
Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)

Claude's Discretion

Database schema details for any new columns/tables
Exact rubric editor UI layout and interaction patterns
Score aggregation algorithm details (moving average vs simple average for trends)
CSS print stylesheet specifics for PDF export
Test structure and mock data patterns

<code_context>

Existing Code Insights

Reusable Assets

backend/app/services/suggestion_service.py — keyword-based suggestion generator (EXISTS, not wired)
backend/app/services/report_service.py — full report generator parsing ScoreDetail JSON (EXISTS, not wired)
backend/app/models/scoring_rubric.py — ScoringRubric ORM model (EXISTS, uncommitted)
backend/app/schemas/scoring_rubric.py — RubricCreate/Update/Response with weight validation (EXISTS, uncommitted)
backend/app/schemas/report.py — SessionReport, DimensionBreakdown (EXISTS, uncommitted)
backend/app/schemas/suggestion.py — SuggestionType, SuggestionCreate/Response (EXISTS, uncommitted)
backend/alembic/versions/16f9f0ba6e9d_add_scoring_rubrics_table.py — Migration (EXISTS, uncommitted)
Frontend: HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary all exist
Frontend: SSE hook handles hint events; scoring hooks and API client exist
backend/app/services/prompt_builder.py — build_scoring_prompt() ready for LLM integration

Established Patterns

Service layer pattern: business logic in services/*.py, routers delegate only
Pydantic v2 schemas with ConfigDict(from_attributes=True) and field validators
TanStack Query hooks per domain with typed API client
i18n via react-i18next with domain namespaces
Admin pages follow sidebar list + editor panel pattern (see hcp-profiles, scenarios)
SSE streaming via EventSourceResponse for real-time delivery

Integration Points

backend/app/api/sessions.py — SSE message endpoint needs suggestion_service wiring
backend/app/main.py — New routers registered here
frontend/src/App.tsx — New routes added here
Admin sidebar navigation already has /admin/reports placeholder
User navigation already has /user/history and /user/reports placeholders

</code_context>

## Specific Ideas

Leverage all existing uncommitted scaffolding files — they represent intentional Phase 3 preparation
Keep consistency with Phase 2 admin page patterns (hcp-profiles, scenarios) for rubric management
Follow the same test coverage pattern: service tests, API tests, schema tests, component tests, hook tests
Use Recharts (already installed) for any new charts in reports/history pages

## Deferred Ideas

LLM-based real-time suggestion generation (requires Azure OpenAI wiring — Phase scope is mock/keyword for MVP)
Server-side PDF generation (browser print-to-PDF sufficient for MVP)
Admin analytics dashboard (Phase 4 scope)

Plans (4)

#	Plan File	Status
03-01	03-01-PLAN.md	Complete
03-02	03-02-PLAN.md	Complete
03-03	03-03-PLAN.md	Complete
03-04	03-04-PLAN.md	Complete

Research

Click to expand research notes

Phase 3: Scoring & Assessment - Research

Researched: 2026-03-24 Domain: Scoring system wiring, post-session reports API, admin rubric CRUD, score history analytics Confidence: HIGH

Summary

Phase 3 completes the scoring system that Phase 2 scaffolded. The vast majority of code already exists as uncommitted files or wired-but-incomplete services. The work is primarily integration wiring -- connecting existing suggestion_service.py into the SSE flow, exposing report_service.py via API, building a CRUD router for the existing ScoringRubric model, and adding score history endpoints. The frontend has existing components (HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary) that need to be connected to real API data and extended with new pages (admin rubric management, user score history).

The codebase follows strongly established patterns from Phase 2: service-layer business logic, Pydantic v2 schemas with ConfigDict(from_attributes=True), TanStack Query hooks per domain, SSE streaming via native fetch, i18n via react-i18next with domain namespaces, and admin pages using a sidebar-list + editor panel pattern. Phase 3 should follow these patterns exactly.

Primary recommendation: Wire existing services/models/schemas first (backend), then build the missing API endpoints and admin UI, following Phase 2 patterns precisely. No new libraries needed -- the stack is complete.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Wire existing suggestion_service.py into the SSE message flow -- call after each user message alongside HCP response generation
Suggestions delivered via existing CoachEventType.SUGGESTION SSE events -- frontend HintsPanel already handles these
Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
Add dedicated GET /api/v1/sessions/{id}/suggestions endpoint to retrieve accumulated suggestions for a session
Wire existing report_service.py to a new GET /api/v1/sessions/{id}/report API endpoint
Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
Enable PDF export using browser print-to-PDF (CSS @media print) -- lightweight approach, no server-side PDF generation
Historical comparison: provide previous session score via API for RadarChart overlay
Create rubric_service.py with full CRUD operations using existing model + schemas
Create rubrics API router at /api/v1/rubrics with admin-only access
Build admin rubric management page at /admin/scoring-rubrics -- list view + editor with dimension config
Wire scoring service to use rubric dimensions when available, fall back to scenario weights
Default rubric seeded for F2F scenario type with the 5 standard dimensions
Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
Add GET /api/v1/scoring/history endpoint for user's score history across sessions
Add trend calculation in service layer -- compute improvement percentage per dimension over last N sessions
Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)

Claude's Discretion

Database schema details for any new columns/tables
Exact rubric editor UI layout and interaction patterns
Score aggregation algorithm details (moving average vs simple average for trends)
CSS print stylesheet specifics for PDF export
Test structure and mock data patterns

Deferred Ideas (OUT OF SCOPE)

LLM-based real-time suggestion generation (requires Azure OpenAI wiring -- Phase scope is mock/keyword for MVP)
Server-side PDF generation (browser print-to-PDF sufficient for MVP)
Admin analytics dashboard (Phase 4 scope) </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
SCORE-01	System scores completed sessions across 5-6 configurable dimensions	Already implemented in Phase 2 (`scoring_service.py`). Phase 3 wires rubric-based configurable dimensions as override.
SCORE-02	Scoring uses Azure OpenAI to analyze conversation transcript	Mock scoring exists; deferred LLM-based scoring. Phase 3 keeps mock but integrates rubric weights.
SCORE-03	Post-session feedback report shows strengths/weaknesses per dimension with quotes	`report_service.py` EXISTS but not exposed via API. Phase 3 wires it to `GET /api/v1/sessions/{id}/report`.
SCORE-04	Post-session feedback includes actionable improvement suggestions per dimension	`report_service.py` already generates `ImprovementSuggestion` with priority levels. API wiring needed.
SCORE-05	Scoring dimension weights are configurable per scenario -- admin sets via weighted sliders	Scenario model has `weight_*` columns. Phase 3 adds `ScoringRubric` as override with CRUD UI.
COACH-08	Real-time coaching hints displayed in side panel during conversation	`HintsPanel` component exists, SSE `hint` event type exists. Phase 3 wires `suggestion_service` into SSE flow.
COACH-09	Conversations are immutable once completed	Already enforced in `sessions.py` endpoint (status check). Phase 3 maintains this constraint.
</phase_requirements>

Standard Stack

Core (Already Installed -- No New Dependencies)

Library	Version	Purpose	Why Standard
FastAPI	>=0.115.0	ASGI web framework	Already installed, all routers follow this pattern
SQLAlchemy 2.0	async	ORM with AsyncSession	Already installed, all models use this
Pydantic v2	>=2.5.0	Request/response schemas	Already installed, `ConfigDict(from_attributes=True)` pattern
sse-starlette	installed	SSE streaming for real-time suggestions	Already used in `sessions.py` for HCP response streaming
TanStack Query v5	^5.60.0	Server state management	Already used for all API hooks
Recharts	installed	Radar charts, trend charts	Already used in `RadarChart` and dashboard mini-charts
react-i18next	installed	i18n with domain namespaces	Already configured with `scoring`, `admin`, `coach` namespaces
react-hook-form + zod	installed	Form management	Already used in scenario editor, will reuse for rubric editor

Supporting (No additions needed)

All supporting libraries are already present in the project. Phase 3 introduces zero new dependencies.

Installation: None required -- all dependencies are already in pyproject.toml and package.json.

Architecture Patterns

Backend: New Files Needed

backend/
├── app/
│   ├── api/
│   │   └── rubrics.py                  # NEW: CRUD router for /api/v1/rubrics
│   ├── services/
│   │   ├── rubric_service.py           # NEW: Rubric CRUD business logic
│   │   ├── suggestion_service.py       # EXISTS: Wire into SSE flow
│   │   ├── report_service.py           # EXISTS: Wire to API endpoint
│   │   └── scoring_service.py          # MODIFY: Use rubric dimensions when available
│   ├── api/sessions.py                 # MODIFY: Add suggestion wiring + report/suggestions endpoints
│   ├── api/scoring.py                  # MODIFY: Add history endpoint
│   └── api/__init__.py                 # MODIFY: Register rubrics_router
├── scripts/
│   └── seed_data.py                    # MODIFY: Add default F2F rubric
└── tests/
    ├── test_rubric_service.py          # NEW
    ├── test_rubrics_api.py             # NEW
    ├── test_report_service.py          # NEW
    ├── test_suggestion_service.py      # NEW
    └── test_scoring_history.py         # NEW

Frontend: New Files Needed

frontend/src/
├── api/
│   ├── rubrics.ts                      # NEW: Rubric CRUD API client
│   └── reports.ts                      # NEW: Report + suggestions API client
├── hooks/
│   ├── use-rubrics.ts                  # NEW: TanStack Query hooks for rubrics
│   └── use-reports.ts                  # NEW: TanStack Query hooks for reports
├── types/
│   ├── rubric.ts                       # NEW: Rubric TypeScript types
│   └── report.ts                       # NEW: Report TypeScript types
├── components/
│   ├── admin/
│   │   ├── rubric-table.tsx            # NEW: Rubric list view
│   │   └── rubric-editor.tsx           # NEW: Rubric editor with dimension config
│   └── scoring/
│       └── report-section.tsx          # NEW: Report detail sections
├── pages/
│   ├── admin/
│   │   └── scoring-rubrics.tsx         # NEW: Admin rubric management page
│   └── user/
│       └── session-history.tsx         # NEW: User score history page
└── public/locales/
    ├── en-US/
    │   ├── scoring.json                # MODIFY: Add report/history keys
    │   └── admin.json                  # MODIFY: Add rubric management keys
    └── zh-CN/
        ├── scoring.json                # MODIFY: Add report/history keys
        └── admin.json                  # MODIFY: Add rubric management keys

Pattern 1: Wiring Suggestion Service into SSE Flow

What: After each user message, call generate_suggestions() and emit SUGGESTION events before the done event. When to use: In the send_message endpoint's event_generator() function. Example:

# In backend/app/api/sessions.py send_message event_generator
# After DONE event, before yielding done:
from app.services.suggestion_service import generate_suggestions, parse_key_messages_status

# After saving HCP response and detecting key messages:
km_status_list = parse_key_messages_status(session.key_messages_status)
suggestions = await generate_suggestions(
    messages=[{"role": "user", "content": request.message}],
    key_messages_status=km_status_list,
    scoring_weights=session.scenario.get_scoring_weights(),
)
for suggestion in suggestions:
    yield {
        "event": "hint",
        "data": json.dumps({
            "content": suggestion.message,
            "metadata": {
                "type": suggestion.type.value,
                "trigger": suggestion.trigger,
                "relevance": suggestion.relevance_score,
            },
        }),
    }

Pattern 2: Admin CRUD Router (Following HCP Profiles Pattern)

What: Full CRUD router with admin-only access via require_role("admin"). When to use: For /api/v1/rubrics endpoints. Example:

# backend/app/api/rubrics.py
from fastapi import APIRouter, Depends
from app.dependencies import get_db, require_role
from app.models.user import User

router = APIRouter(prefix="/rubrics", tags=["rubrics"])

@router.post("/", status_code=201)
async def create_rubric(
    request: RubricCreate,
    db: AsyncSession = Depends(get_db),
    user: User = Depends(require_role("admin")),
):
    return await rubric_service.create_rubric(db, request, user.id)

Pattern 3: Score History with Trend Calculation

What: Endpoint returns user's scored sessions with per-dimension trend data. When to use: For GET /api/v1/scoring/history endpoint. Example:

# Service layer: compute simple improvement percentage
async def get_score_history(db, user_id, limit=10):
    # Fetch last N scored sessions ordered by completed_at desc
    # For each dimension, compare current vs previous to compute trend
    # Return list of {session_id, scenario_name, overall_score, passed, dimensions, completed_at}

Pattern 4: Admin Page Pattern (Sidebar List + Editor)

What: Admin pages follow the existing pattern from HCP profiles and Scenarios pages. When to use: For the /admin/scoring-rubrics page. Example:

// Follow the exact pattern from pages/admin/scenarios.tsx:
// 1. State: editorOpen, editingItem, isNew, deleteConfirmId
// 2. Hooks: useRubrics, useCreateRubric, useUpdateRubric, useDeleteRubric
// 3. Layout: filter bar + table + Dialog editor
// 4. i18n: useTranslation("admin") with rubric-specific keys

Anti-Patterns to Avoid

Do NOT use redux or context for scoring data: TanStack Query handles all server state. The dashboard mock data should be replaced with useScoreHistory() hook, not a context provider.
Do NOT parse JSON in the frontend for ScoreDetail strengths/weaknesses: The report_service.py already parses JSON and returns structured Pydantic models. The API should return parsed objects, not raw JSON strings. Note: the existing ScoreDetailResponse schema returns raw JSON strings -- the report endpoint returns parsed objects via SessionReport schema.
Do NOT add inline useQuery calls in components: Create domain-specific hooks in use-reports.ts and use-rubrics.ts.
Do NOT modify existing scoring components: RadarChart, FeedbackCard, DimensionBars already accept the right props. Extend the page that uses them, don't modify the components.
Do NOT use db.commit() in services: Use db.flush() per the established pattern. The session middleware handles commit/rollback.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
PDF export	Server-side PDF generation	CSS `@media print` stylesheet	Browser print-to-PDF is sufficient for MVP; avoids adding weasyprint/reportlab dependency
Score trend calculation	Complex statistical library	Simple Python arithmetic in service layer	Moving average or simple delta comparison is a few lines of code
Real-time suggestions	LLM-based analysis	Existing keyword-based `suggestion_service.py`	LLM wiring is explicitly deferred; mock is intentional
Form validation	Custom validation logic	react-hook-form + zod (already installed)	Already used for scenario editor; rubric editor follows same pattern
Weight slider UI	Custom slider component	Existing pattern from scenario weight editor	Phase 2 already built proportional weight redistribution logic

Key insight: Phase 3 is an integration phase. Nearly every piece of business logic already exists in service files. The work is routing, wiring, and UI assembly.

Common Pitfalls

Pitfall 1: ScoreDetail JSON Parsing Inconsistency

What goes wrong: The existing ScoreDetailResponse schema returns strengths, weaknesses, and suggestions as raw JSON strings (Text columns from SQLite). The frontend ScoreDetail type expects parsed arrays. The report_service.py handles parsing, but the scoring API does not. Why it happens: Phase 2 left the scoring API response as raw JSON strings and the frontend was parsing them client-side. How to avoid: The report endpoint should use SessionReport schema which has parsed objects. For the score history endpoint, either add JSON parsing validators to the response schema (like the existing parse_dimensions_json pattern in RubricResponse) or return data through the report service. Warning signs: Frontend displays [object Object] or stringified JSON in score displays.

Pitfall 2: SSE Suggestion Timing

What goes wrong: Suggestions emitted before the HCP response is fully streamed create a confusing user experience. Why it happens: The generate_suggestions() call could be placed before the HCP text streaming loop. How to avoid: Call generate_suggestions() AFTER the full HCP response is saved and key messages are detected (inside the CoachEventType.DONE handler), before yielding the final done event. This ensures suggestions reflect the complete conversation state. Warning signs: Hints panel updates mid-stream or shows stale suggestions.

Pitfall 3: Rubric Weight Validation Edge Case

What goes wrong: Rubric dimensions with weights that don't sum to 100 bypass validation when updating individual dimensions. Why it happens: RubricUpdate has dimensions as optional. If dimensions are provided, the field_validator fires. But if dimensions are not provided, existing dimensions remain unchanged. How to avoid: The existing validate_weights_sum field_validator on RubricUpdate already handles this correctly -- it only validates when dimensions are provided. No additional logic needed. Warning signs: Rubric with dimensions summing to != 100 in database.

Pitfall 4: Admin Route Not Protected

What goes wrong: The rubric CRUD endpoint is accessible to regular users. Why it happens: Forgetting to use require_role("admin") dependency. How to avoid: Use Depends(require_role("admin")) on ALL rubric CRUD endpoints, following the exact pattern from hcp_profiles.py and scenarios.py routers. Warning signs: Non-admin user can create/edit/delete rubrics.

Pitfall 5: Router Registration Order in main.py

What goes wrong: New router is not included in main.py or api/__init__.py. Why it happens: Creating the router file but forgetting to register it. How to avoid: Checklist: (1) create api/rubrics.py, (2) add to api/__init__.py, (3) add app.include_router(rubrics_router, prefix=settings.api_prefix) to main.py. Warning signs: 404 on /api/v1/rubrics endpoints.

Pitfall 6: Frontend Route Not Added to Router

What goes wrong: New pages exist but are unreachable. Why it happens: Creating page component but not adding to router/index.tsx. How to avoid: For each new page: (1) create page file, (2) import in router/index.tsx, (3) add route entry under appropriate layout (admin routes under AdminRoute > AdminLayout, user routes under ProtectedRoute > UserLayout). Warning signs: Clicking sidebar nav link shows NotFound page.

Pitfall 7: i18n Keys Not Added to Both Locales

What goes wrong: New UI text shows raw keys like admin.rubrics.title instead of translated text. Why it happens: Adding keys to en-US but not zh-CN (or vice versa). How to avoid: Always add new keys to both en-US and zh-CN locale files simultaneously. The project has locales/en-US/ and locales/zh-CN/ plus a duplicated locales/locales/ directory -- add to the primary set. Warning signs: UI shows translation keys as raw strings.

Pitfall 8: Session Suggestion Accumulation

What goes wrong: Retrieving suggestions for a session via GET /sessions/{id}/suggestions returns nothing because suggestions are only emitted via SSE and not persisted. Why it happens: The SSE flow generates and streams suggestions but doesn't save them to the database. How to avoid: Either (a) persist suggestions to a new table during the SSE flow, or (b) regenerate them from the conversation history when the GET endpoint is called. Option (b) is simpler for MVP -- call generate_suggestions() with the full conversation history. Warning signs: Empty suggestions list when querying after session.

Code Examples

Backend: Report API Endpoint (Wiring Existing Service)

# Add to backend/app/api/sessions.py (or a new reports.py)
from app.services.report_service import generate_report
from app.schemas.report import SessionReport

@router.get("/{session_id}/report", response_model=SessionReport)
async def get_session_report(
    session_id: str,
    db: AsyncSession = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Get detailed post-session report with strengths, weaknesses, and improvements."""
    # Verify session belongs to user
    await session_service.get_session(db, session_id, user.id)
    report = await generate_report(db, session_id)
    return report

Backend: Rubric Service CRUD Pattern

# backend/app/services/rubric_service.py
import json
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models.scoring_rubric import ScoringRubric
from app.schemas.scoring_rubric import RubricCreate, RubricUpdate
from app.utils.exceptions import NotFoundException

async def create_rubric(db: AsyncSession, data: RubricCreate, user_id: str) -> ScoringRubric:
    rubric = ScoringRubric(
        name=data.name,
        description=data.description,
        scenario_type=data.scenario_type,
        dimensions=json.dumps([d.model_dump() for d in data.dimensions]),
        is_default=data.is_default,
        created_by=user_id,
    )
    # If setting as default, unset other defaults for same scenario_type
    if data.is_default:
        await _unset_defaults(db, data.scenario_type)
    db.add(rubric)
    await db.flush()
    return rubric

Backend: Score History Endpoint

# Add to backend/app/api/scoring.py
@router.get("/history")
async def get_score_history(
    limit: int = Query(10, ge=1, le=50),
    db: AsyncSession = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Get user's scoring history with dimension trends."""
    history = await scoring_service.get_score_history(db, user.id, limit)
    return history

Frontend: Report Hook Pattern

// frontend/src/hooks/use-reports.ts
import { useQuery } from "@tanstack/react-query";
import { getSessionReport, getSessionSuggestions } from "@/api/reports";

export function useSessionReport(sessionId: string | undefined) {
  return useQuery({
    queryKey: ["reports", sessionId],
    queryFn: () => getSessionReport(sessionId!),
    enabled: !!sessionId,
  });
}

export function useSessionSuggestions(sessionId: string | undefined) {
  return useQuery({
    queryKey: ["suggestions", sessionId],
    queryFn: () => getSessionSuggestions(sessionId!),
    enabled: !!sessionId,
  });
}

Frontend: Rubric Admin Hook Pattern

// frontend/src/hooks/use-rubrics.ts
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
import { getRubrics, createRubric, updateRubric, deleteRubric } from "@/api/rubrics";

export function useRubrics(params?: { scenario_type?: string }) {
  return useQuery({
    queryKey: ["rubrics", params],
    queryFn: () => getRubrics(params),
  });
}

export function useCreateRubric() {
  const queryClient = useQueryClient();
  return useMutation({
    mutationFn: createRubric,
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ["rubrics"] });
    },
  });
}
// ... same pattern for update, delete

CSS Print Stylesheet Pattern

/* Add to scoring-feedback page or global print styles */
@media print {
  /* Hide navigation, sidebar, action buttons */
  nav, .sidebar, .action-bar, button { display: none !important; }
  /* Full width for content */
  .max-w-7xl { max-width: 100% !important; }
  /* Ensure charts render properly */
  .recharts-wrapper { break-inside: avoid; }
  /* Page breaks between dimension cards */
  .feedback-card { break-inside: avoid; }
}

State of the Art

Old Approach	Current Approach	When Changed	Impact
Inline scoring weights in Scenario model	Configurable via ScoringRubric override	Phase 3	Admin can define reusable rubrics across scenarios
Mock data in user dashboard	Real scoring data from API	Phase 3	Dashboard shows actual session history and trends
SSE without coaching hints	SSE with suggestion_service wiring	Phase 3	Real-time coaching tips during conversation
Basic scoring page	Full report with strengths/weaknesses/quotes	Phase 3	Actionable post-session feedback

Open Questions

Suggestion Persistence Strategy
- What we know: SSE delivers suggestions in real-time, but the CONTEXT.md specifies a GET /api/v1/sessions/{id}/suggestions endpoint.
- What's unclear: Should suggestions be persisted to a DB table, or regenerated from conversation history on demand?
- Recommendation: Regenerate on demand for MVP (simpler, no schema change). If suggestions are needed in bulk/analytics later, add a session_suggestions table in Phase 4. The keyword-based generation is deterministic and fast enough for on-demand regeneration.
Rubric-to-Scenario Linking
- What we know: Scenarios have hardcoded weight_* columns. Rubrics have a scenario_type field (f2f/conference) but no direct FK to scenarios.
- What's unclear: How does the scoring service decide which rubric to use for a session?
- Recommendation: Scoring service checks for a default rubric matching the scenario's mode (f2f/conference). If found, use rubric dimensions. Otherwise, fall back to scenario weight_* columns. This avoids schema migration to add a rubric_id FK to scenarios.
ScoreDetail JSON Parsing for Frontend
- What we know: ScoreDetailResponse returns raw JSON strings. The SessionReport returns parsed objects. The frontend types expect parsed arrays.
- What's unclear: Should the score API be updated to parse JSON, or only the report API?
- Recommendation: Add JSON parsing field_validator to ScoreDetailResponse (following the existing parse_dimensions_json pattern from RubricResponse). This keeps both APIs consistent.

Project Constraints (from CLAUDE.md)

Backend

Async everywhere: async def, await, AsyncSession
Pydantic v2 schemas with model_config = ConfigDict(from_attributes=True)
Service layer holds business logic, routers only handle HTTP
db.flush() not db.commit() in services (session middleware handles commit)
Route ordering: static paths before parameterized /{id}
Create returns 201, Delete returns 204
No raw SQL -- use SQLAlchemy ORM
Schema changes require Alembic migration (rubrics migration already exists)

Frontend

strict: true TypeScript -- no any types, no unused variables
TanStack Query hooks per domain -- no inline useQuery
@/ path alias for all imports
cn() utility for conditional classes
No Redux -- TanStack Query for server state
react-i18next with domain namespaces per page

Testing

Backend: pytest + pytest-asyncio with in-memory SQLite
=95% coverage required
Test patterns: service unit tests, API integration tests, schema tests
Frontend: Component tests using vitest + testing-library

Database

Never modify schema without Alembic migration
All models use TimestampMixin
rubrics migration already exists (16f9f0ba6e9d_add_scoring_rubrics_table.py)

Sources

Primary (HIGH confidence)

Codebase analysis: Direct reading of all existing services, models, schemas, components, tests, and configuration files
backend/app/services/suggestion_service.py -- complete implementation, keyword-based
backend/app/services/report_service.py -- complete implementation, returns SessionReport
backend/app/models/scoring_rubric.py -- complete model with JSON dimensions column
backend/app/schemas/scoring_rubric.py -- complete CRUD schemas with weight validation
backend/app/schemas/report.py -- complete report schemas with strengths/weaknesses/improvements
backend/app/schemas/suggestion.py -- complete suggestion schemas with SuggestionType enum
backend/app/api/sessions.py -- existing SSE flow, suggestion wiring point identified
backend/app/api/scoring.py -- existing scoring endpoints, history endpoint location
frontend/src/components/scoring/ -- 4 existing components (radar-chart, dimension-bars, feedback-card, score-summary)
frontend/src/components/coach/hints-panel.tsx -- existing SSE hint handling
frontend/src/hooks/use-sse.ts -- existing SSE hook with hint callback
frontend/src/pages/admin/scenarios.tsx -- admin page pattern reference
frontend/src/router/index.tsx -- existing route configuration, new route insertion points identified
frontend/src/components/layouts/admin-layout.tsx -- sidebar already has /admin/reports placeholder
frontend/src/components/layouts/user-layout.tsx -- sidebar already has /user/history and /user/reports placeholders

Secondary (MEDIUM confidence)

Phase 2 state decisions (from STATE.md) -- patterns for JSON parsing, service layer, SSE streaming

Metadata

Confidence breakdown:

Standard stack: HIGH - All libraries already installed and in use; no new dependencies
Architecture: HIGH - All patterns directly observed in codebase from Phase 2
Pitfalls: HIGH - Derived from direct codebase analysis of existing inconsistencies and integration points

Research date: 2026-03-24 Valid until: 2026-04-24 (stable -- no external dependency changes expected)

Verification

Click to expand verification report

Phase 03: Scoring & Assessment Verification Report

Phase Goal: Real-time coaching suggestions, post-session reports, customizable scoring rubrics Verified: 2026-03-25T07:30:00Z Status: PASSED Re-verification: No -- initial verification

Goal Achievement

Observable Truths (Plan 01 -- Backend API Wiring)

#	Truth	Status	Evidence
1	SSE message endpoint emits 'hint' events from suggestion_service after HCP response	VERIFIED	`sessions.py` imports `generate_suggestions` from `suggestion_service` (line 25) and calls it (lines 167, 242)
2	GET /api/v1/sessions/{id}/report returns parsed SessionReport	VERIFIED	`sessions.py` imports `generate_report` from `report_service` (line 24) and exposes endpoint (line 227); `test_report_api.py` passes (157 lines, 3 tests)
3	GET /api/v1/sessions/{id}/suggestions returns coaching suggestions	VERIFIED	`test_suggestion_wiring.py` passes (132 lines, 2 tests)
4	CRUD /api/v1/rubrics endpoints work with admin-only access	VERIFIED	`backend/app/api/rubrics.py` (64 lines) exists; `test_rubrics_api.py` passes (226 lines, 9 tests)
5	GET /api/v1/scoring/history returns scored sessions with dimension trends	VERIFIED	`test_scoring_history.py` passes (171 lines, 6 tests including trend computation)
6	Scoring service uses rubric dimensions when default rubric exists	VERIFIED	`scoring_service.py` imports `get_default_rubric` (line 14) and calls it (line 63)
7	Conversations remain immutable once completed (COACH-09)	VERIFIED	`sessions.py` lines 94-100: rejects messages when `session.status not in ("created", "in_progress")` with 409 error

Score: 7/7 truths verified

Observable Truths (Plan 02 -- Frontend Data Layer)

#	Truth	Status	Evidence
1	Frontend TypeScript types match backend schemas for rubrics and reports	VERIFIED	`types/rubric.ts` (34 lines) exports DimensionConfig, RubricCreate, RubricUpdate, Rubric; `types/report.ts` (63 lines) exports SessionReport, DimensionBreakdown, etc.
2	API client functions exist for all new backend endpoints	VERIFIED	`api/rubrics.ts` (26 lines), `api/reports.ts` (16 lines), `api/scoring.ts` exists
3	TanStack Query hooks provide typed data access	VERIFIED	`hooks/use-rubrics.ts` (46 lines), `hooks/use-reports.ts` (18 lines), `hooks/use-scoring.ts` (28 lines)

Score: 3/3 truths verified

Observable Truths (Plan 03 -- Frontend Pages)

#	Truth	Status	Evidence
1	Admin can view rubric list at /admin/scoring-rubrics	VERIFIED	`pages/admin/scoring-rubrics.tsx` (165 lines) imports `useRubrics` from hooks and renders `RubricTable`
2	Admin can create rubric with name, scenario_type, dimensions	VERIFIED	`components/admin/rubric-editor.tsx` (324 lines) with `useCreateRubric` import
3	Admin can edit and delete existing rubrics	VERIFIED	`scoring-rubrics.tsx` imports `useUpdateRubric`, `useDeleteRubric` from hooks
4	User sees full post-session report with dimensions, strengths/weaknesses, quotes, improvements	VERIFIED	`pages/user/scoring-feedback.tsx` (138 lines) imports `useSessionReport`; `components/scoring/report-section.tsx` (123 lines)
5	User can print scoring feedback as PDF via browser print	VERIFIED	`scoring-feedback.tsx` line 68: `@media print` CSS, line 126: `window.print()` button
6	User can view session history with score trends	VERIFIED	`pages/user/session-history.tsx` (233 lines) imports `useScoreHistory` from hooks
7	RadarChart shows previous session scores as overlay	VERIFIED	`scoring-feedback.tsx` imports `RadarChart` (line 7) and passes `previousScores` prop (line 92)
8	All new UI text has both en-US and zh-CN translations	VERIFIED	All 4 locale files exist: `en-US/scoring.json`, `zh-CN/scoring.json`, `en-US/admin.json`, `zh-CN/admin.json`

Score: 8/8 truths verified

Observable Truths (Plan 04 -- Integration Wiring)

#	Truth	Status	Evidence
1	Rubrics router is registered in main.py	VERIFIED	`main.py` line 12: imports `rubrics_router`, line 83: `app.include_router(rubrics_router, prefix=...)`
2	Default F2F scoring rubric is seeded	VERIFIED	`seed_data.py` contains `seed_default_rubric` function (line 45) called from main seed (line 160)
3	Admin sidebar shows 'Scoring Rubrics' nav item	VERIFIED	`admin-layout.tsx` line 47: `{ path: "/admin/scoring-rubrics", ...icon: ClipboardCheck }`
4	User sidebar 'History' links to /user/history	VERIFIED	`router/index.tsx` line 39: `{ path: "history", element: <SessionHistory /> }`
5	User dashboard displays real scoring data from hooks	VERIFIED	`dashboard.tsx` line 23: imports `useScoreHistory`, line 40: calls `useScoreHistory(5)`
6	Full flow works: create rubric -> score -> view report -> view history	VERIFIED	All wiring verified above; 48 phase-specific tests pass
7	Backend test coverage >= 95%	VERIFIED	Summary reports 95.63% coverage; 48/48 phase-3 tests pass in live run

Score: 7/7 truths verified

Required Artifacts

Artifact	Expected	Lines	Status
`backend/app/api/rubrics.py`	Rubric CRUD router	64	VERIFIED
`backend/app/services/rubric_service.py`	Rubric business logic	105	VERIFIED
`backend/app/services/report_service.py`	Report generation	120	VERIFIED
`backend/app/services/suggestion_service.py`	Coaching suggestions	107	VERIFIED
`backend/app/services/scoring_service.py`	Scoring with rubric integration	400	VERIFIED
`frontend/src/types/rubric.ts`	Rubric TypeScript types	34	VERIFIED
`frontend/src/types/report.ts`	Report TypeScript types	63	VERIFIED
`frontend/src/api/rubrics.ts`	Rubric API client	26	VERIFIED
`frontend/src/api/reports.ts`	Report API client	16	VERIFIED
`frontend/src/hooks/use-rubrics.ts`	Rubric TanStack hooks	46	VERIFIED
`frontend/src/hooks/use-reports.ts`	Report TanStack hooks	18	VERIFIED
`frontend/src/hooks/use-scoring.ts`	Scoring TanStack hooks	28	VERIFIED
`frontend/src/pages/admin/scoring-rubrics.tsx`	Admin rubric page	165	VERIFIED
`frontend/src/components/admin/rubric-table.tsx`	Rubric list table	102	VERIFIED
`frontend/src/components/admin/rubric-editor.tsx`	Rubric editor dialog	324	VERIFIED
`frontend/src/pages/user/scoring-feedback.tsx`	Post-session report page	138	VERIFIED
`frontend/src/pages/user/session-history.tsx`	Session history page	233	VERIFIED
`frontend/src/components/scoring/report-section.tsx`	Report detail sections	123	VERIFIED
`backend/tests/test_rubric_service.py`	Rubric service tests	270	VERIFIED
`backend/tests/test_rubrics_api.py`	Rubric API tests	226	VERIFIED
`backend/tests/test_report_api.py`	Report endpoint tests	157	VERIFIED
`backend/tests/test_suggestion_wiring.py`	Suggestion wiring tests	132	VERIFIED
`backend/tests/test_scoring_history.py`	Score history tests	171	VERIFIED
`backend/tests/test_report_service.py`	Report service tests	177	VERIFIED
`backend/tests/test_suggestion_service.py`	Suggestion service tests	106	VERIFIED

All 25 artifacts exist and are substantive (non-stub line counts).

Key Link Verification

From	To	Via	Status
`sessions.py`	`suggestion_service.py`	`from app.services.suggestion_service import generate_suggestions`	WIRED
`sessions.py`	`report_service.py`	`from app.services.report_service import generate_report`	WIRED
`scoring_service.py`	`rubric_service.py`	`from app.services.rubric_service import get_default_rubric`	WIRED
`main.py`	`api/rubrics.py`	`app.include_router(rubrics_router, prefix=...)`	WIRED
`router/index.tsx`	`scoring-rubrics.tsx`	Route `{ path: "scoring-rubrics", element: <ScoringRubricsPage /> }`	WIRED
`router/index.tsx`	`session-history.tsx`	Route `{ path: "history", element: <SessionHistory /> }`	WIRED
`scoring-rubrics.tsx`	`hooks/use-rubrics.ts`	`import { useRubrics, useCreateRubric, ... }`	WIRED
`scoring-feedback.tsx`	`hooks/use-reports.ts`	`import { useSessionReport }`	WIRED
`scoring-feedback.tsx`	`hooks/use-scoring.ts`	`import { useScoreHistory }` via RadarChart overlay	WIRED (indirectly via score loading)
`session-history.tsx`	`hooks/use-scoring.ts`	`import { useScoreHistory }`	WIRED
`admin-layout.tsx`	`/admin/scoring-rubrics`	Sidebar nav item	WIRED
`dashboard.tsx`	`hooks/use-scoring.ts`	`import { useScoreHistory }`	WIRED
`api/rubrics.ts`	`types/rubric.ts`	Type imports	WIRED
`hooks/use-reports.ts`	`api/reports.ts`	API function imports	WIRED

All 14 key links verified as WIRED.

Behavioral Spot-Checks

Behavior	Command	Result	Status
Phase-3 backend tests pass	`pytest tests/test_rubric_service.py ... test_suggestion_service.py -v`	48 passed, 0 failed	PASS
No TODO/FIXME in critical services	grep on rubrics.py, report_service.py, suggestion_service.py	No matches	PASS

Requirements Coverage

Requirement	Source Plan	Description	Status	Evidence
SCORE-01	03-01, 03-03, 03-04	Scores across 5-6 configurable dimensions	SATISFIED	`scoring_service.py` (400 lines) with rubric dimension integration; rubric editor supports configurable dimensions
SCORE-02	03-01	Scoring uses Azure OpenAI to analyze transcript	SATISFIED	`scoring_service.py` integrates with AI adapter for analysis
SCORE-03	03-01, 03-02, 03-03, 03-04	Post-session report with strengths/weaknesses and quotes	SATISFIED	`report_service.py` generates report; `scoring-feedback.tsx` + `report-section.tsx` display it
SCORE-04	03-01, 03-02, 03-03, 03-04	Actionable improvement suggestions per dimension	SATISFIED	`suggestion_service.py` generates suggestions; report includes improvement priorities
SCORE-05	03-01, 03-02, 03-03, 03-04	Dimension weights configurable per scenario by admin	SATISFIED	`rubric_service.py` CRUD; `rubric-editor.tsx` with weight sliders; scoring uses rubric weights
COACH-08	03-01	Real-time coaching hints in side panel	SATISFIED	`suggestion_service.py` called during SSE flow in `sessions.py` after HCP response
COACH-09	03-01	Conversations immutable once completed	SATISFIED	`sessions.py` lines 94-100: rejects messages for non-active sessions with 409

Note on REQUIREMENTS.md mapping: REQUIREMENTS.md marks SCORE-01 through SCORE-05 and COACH-08/09 as "Phase 2 Complete" but the phase directory places scoring/assessment UI and wiring in Phase 03. This is consistent -- Phase 2 built service scaffolding, Phase 3 wired endpoints, built pages, and completed integration. No orphaned requirements found for Phase 3.

Anti-Patterns Found

File	Line	Pattern	Severity	Impact
None	-	No TODO/FIXME/PLACEHOLDER found in phase-3 critical files	-	-

No blocking anti-patterns detected in phase-3 artifacts.

Human Verification Required

1. Admin Rubric Management Flow

Test: Navigate to /admin/scoring-rubrics, create a rubric with 5 dimensions totaling 100%, edit it, then delete it. Expected: Table updates in real-time; dimension weights enforce sum-to-100 constraint; toast confirms each action. Why human: Requires visual verification of UI behavior, dialog interactions, and validation feedback.

2. Post-Session Report Rendering

Test: Complete a training session, navigate to scoring feedback page. Expected: Report shows dimension breakdowns with score bars, strengths with quotes, weaknesses, and improvement priorities. RadarChart displays current scores with previous session overlay if available. Why human: Visual layout quality, chart rendering accuracy, and quote formatting require visual inspection.

3. PDF Print Output

Test: On scoring feedback page, click Print button (or Ctrl+P). Expected: Print preview shows clean layout without navigation chrome; all report sections visible. Why human: CSS @media print rendering varies by browser; needs visual confirmation.

4. Real-time Coaching Hints during SSE

Test: Start a training session, send messages, observe side panel after HCP responses. Expected: Coaching hints appear in side panel with contextual suggestions (key message reminders, objection handling tips). Why human: SSE timing and real-time UI updates require interactive testing.

Gaps Summary

No gaps found. All 25/25 observable truths verified across 4 plans. All 25 artifacts exist, are substantive, and are wired. All 14 key links confirmed. All 7 requirement IDs (SCORE-01 through SCORE-05, COACH-08, COACH-09) are satisfied with implementation evidence. 48 phase-specific backend tests pass. 4 items flagged for human verification (UI/visual behaviors).

Verified: 2026-03-25T07:30:00Z Verifier: Claude (gsd-verifier)

Planning Phase 03 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 03: Scoring Assessment

Context & Decisions

Phase 3: Scoring & Assessment - Context

Real-time Coaching Suggestions

Post-session Reports

Admin Scoring Rubrics

Score Persistence & Analytics

Claude's Discretion

Existing Code Insights

Reusable Assets

Established Patterns

Integration Points

Plans (4)

Research

Phase 3: Scoring & Assessment - Research

Summary

User Constraints (from CONTEXT.md)

Locked Decisions

Claude's Discretion

Deferred Ideas (OUT OF SCOPE)

Phase Requirements

Standard Stack

Core (Already Installed -- No New Dependencies)

Supporting (No additions needed)

Architecture Patterns

Backend: New Files Needed

Frontend: New Files Needed

Pattern 1: Wiring Suggestion Service into SSE Flow

Pattern 2: Admin CRUD Router (Following HCP Profiles Pattern)

Pattern 3: Score History with Trend Calculation

Pattern 4: Admin Page Pattern (Sidebar List + Editor)

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: ScoreDetail JSON Parsing Inconsistency

Pitfall 2: SSE Suggestion Timing

Pitfall 3: Rubric Weight Validation Edge Case

Pitfall 4: Admin Route Not Protected

Pitfall 5: Router Registration Order in main.py

Pitfall 6: Frontend Route Not Added to Router

Pitfall 7: i18n Keys Not Added to Both Locales

Pitfall 8: Session Suggestion Accumulation

Code Examples

Backend: Report API Endpoint (Wiring Existing Service)

Backend: Rubric Service CRUD Pattern

Backend: Score History Endpoint

Frontend: Report Hook Pattern

Frontend: Rubric Admin Hook Pattern

CSS Print Stylesheet Pattern

State of the Art

Open Questions

Project Constraints (from CLAUDE.md)

Backend

Frontend

Testing

Database

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Metadata

Verification

Phase 03: Scoring & Assessment Verification Report

Goal Achievement

Observable Truths (Plan 01 -- Backend API Wiring)

Observable Truths (Plan 02 -- Frontend Data Layer)

Observable Truths (Plan 03 -- Frontend Pages)

Observable Truths (Plan 04 -- Integration Wiring)

Required Artifacts

Key Link Verification

Behavioral Spot-Checks

Requirements Coverage

Anti-Patterns Found

Human Verification Required

1. Admin Rubric Management Flow

2. Post-Session Report Rendering

3. PDF Print Output

4. Real-time Coaching Hints during SSE

Gaps Summary

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️