Planning Phase 03 - huqianghui/AI-Coach-vibe-coding GitHub Wiki
Auto-generated from
.planning/phases/03-scoring-assessment
Last synced: 2026-04-02
Gathered: 2026-03-24 Status: Ready for planning Mode: Auto-generated (full user authorization — Claude's discretion on all decisions)
## Phase BoundaryComplete the scoring system with: (1) real-time coaching suggestions wired into the SSE session flow, (2) detailed post-session reports with strengths/weaknesses/improvement areas exposed via API, (3) admin-customizable scoring rubrics with full CRUD, and (4) historical scoring data persistence and query APIs for trend analysis. Phase 2 built the scoring foundation (mock scores, 5 dimensions, basic feedback page). Phase 3 finishes the job — wiring the existing scaffolding, adding missing APIs, and building the admin rubric management UI.
## Implementation Decisions- Wire existing
suggestion_service.pyinto the SSE message flow — call after each user message alongside HCP response generation - Suggestions delivered via existing
CoachEventType.SUGGESTIONSSE events — frontend HintsPanel already handles these - Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
- Add dedicated
GET /api/v1/sessions/{id}/suggestionsendpoint to retrieve accumulated suggestions for a session
- Wire existing
report_service.pyto a newGET /api/v1/sessions/{id}/reportAPI endpoint - Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
- Enable PDF export using browser print-to-PDF (CSS @media print) — lightweight approach, no server-side PDF generation
- Historical comparison: provide previous session score via API for RadarChart overlay
- Create
rubric_service.pywith full CRUD operations using existing model + schemas - Create
rubricsAPI router at/api/v1/rubricswith admin-only access - Build admin rubric management page at
/admin/scoring-rubrics— list view + editor with dimension config - Wire scoring service to use rubric dimensions when available, fall back to scenario weights
- Default rubric seeded for F2F scenario type with the 5 standard dimensions
- Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
- Add
GET /api/v1/scoring/historyendpoint for user's score history across sessions - Add trend calculation in service layer — compute improvement percentage per dimension over last N sessions
- Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)
- Database schema details for any new columns/tables
- Exact rubric editor UI layout and interaction patterns
- Score aggregation algorithm details (moving average vs simple average for trends)
- CSS print stylesheet specifics for PDF export
- Test structure and mock data patterns
<code_context>
-
backend/app/services/suggestion_service.py— keyword-based suggestion generator (EXISTS, not wired) -
backend/app/services/report_service.py— full report generator parsing ScoreDetail JSON (EXISTS, not wired) -
backend/app/models/scoring_rubric.py— ScoringRubric ORM model (EXISTS, uncommitted) -
backend/app/schemas/scoring_rubric.py— RubricCreate/Update/Response with weight validation (EXISTS, uncommitted) -
backend/app/schemas/report.py— SessionReport, DimensionBreakdown (EXISTS, uncommitted) -
backend/app/schemas/suggestion.py— SuggestionType, SuggestionCreate/Response (EXISTS, uncommitted) -
backend/alembic/versions/16f9f0ba6e9d_add_scoring_rubrics_table.py— Migration (EXISTS, uncommitted) - Frontend: HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary all exist
- Frontend: SSE hook handles
hintevents; scoring hooks and API client exist -
backend/app/services/prompt_builder.py—build_scoring_prompt()ready for LLM integration
- Service layer pattern: business logic in
services/*.py, routers delegate only - Pydantic v2 schemas with
ConfigDict(from_attributes=True)and field validators - TanStack Query hooks per domain with typed API client
- i18n via react-i18next with domain namespaces
- Admin pages follow sidebar list + editor panel pattern (see hcp-profiles, scenarios)
- SSE streaming via EventSourceResponse for real-time delivery
-
backend/app/api/sessions.py— SSE message endpoint needs suggestion_service wiring -
backend/app/main.py— New routers registered here -
frontend/src/App.tsx— New routes added here - Admin sidebar navigation already has
/admin/reportsplaceholder - User navigation already has
/user/historyand/user/reportsplaceholders
</code_context>
## Specific Ideas- Leverage all existing uncommitted scaffolding files — they represent intentional Phase 3 preparation
- Keep consistency with Phase 2 admin page patterns (hcp-profiles, scenarios) for rubric management
- Follow the same test coverage pattern: service tests, API tests, schema tests, component tests, hook tests
- Use Recharts (already installed) for any new charts in reports/history pages
- LLM-based real-time suggestion generation (requires Azure OpenAI wiring — Phase scope is mock/keyword for MVP)
- Server-side PDF generation (browser print-to-PDF sufficient for MVP)
- Admin analytics dashboard (Phase 4 scope)
| # | Plan File | Status |
|---|---|---|
| 03-01 | 03-01-PLAN.md | Complete |
| 03-02 | 03-02-PLAN.md | Complete |
| 03-03 | 03-03-PLAN.md | Complete |
| 03-04 | 03-04-PLAN.md | Complete |
Click to expand research notes
Researched: 2026-03-24 Domain: Scoring system wiring, post-session reports API, admin rubric CRUD, score history analytics Confidence: HIGH
Phase 3 completes the scoring system that Phase 2 scaffolded. The vast majority of code already exists as uncommitted files or wired-but-incomplete services. The work is primarily integration wiring -- connecting existing suggestion_service.py into the SSE flow, exposing report_service.py via API, building a CRUD router for the existing ScoringRubric model, and adding score history endpoints. The frontend has existing components (HintsPanel, RadarChart, DimensionBars, FeedbackCard, ScoreSummary) that need to be connected to real API data and extended with new pages (admin rubric management, user score history).
The codebase follows strongly established patterns from Phase 2: service-layer business logic, Pydantic v2 schemas with ConfigDict(from_attributes=True), TanStack Query hooks per domain, SSE streaming via native fetch, i18n via react-i18next with domain namespaces, and admin pages using a sidebar-list + editor panel pattern. Phase 3 should follow these patterns exactly.
Primary recommendation: Wire existing services/models/schemas first (backend), then build the missing API endpoints and admin UI, following Phase 2 patterns precisely. No new libraries needed -- the stack is complete.
<user_constraints>
- Wire existing
suggestion_service.pyinto the SSE message flow -- call after each user message alongside HCP response generation - Suggestions delivered via existing
CoachEventType.SUGGESTIONSSE events -- frontend HintsPanel already handles these - Keep keyword-based mock analysis for MVP (already implemented); LLM-based analysis deferred to AI adapter wiring
- Add dedicated
GET /api/v1/sessions/{id}/suggestionsendpoint to retrieve accumulated suggestions for a session - Wire existing
report_service.pyto a newGET /api/v1/sessions/{id}/reportAPI endpoint - Extend the existing scoring-feedback page to show full report data (DimensionBreakdown with quotes, improvement priorities)
- Enable PDF export using browser print-to-PDF (CSS @media print) -- lightweight approach, no server-side PDF generation
- Historical comparison: provide previous session score via API for RadarChart overlay
- Create
rubric_service.pywith full CRUD operations using existing model + schemas - Create
rubricsAPI router at/api/v1/rubricswith admin-only access - Build admin rubric management page at
/admin/scoring-rubrics-- list view + editor with dimension config - Wire scoring service to use rubric dimensions when available, fall back to scenario weights
- Default rubric seeded for F2F scenario type with the 5 standard dimensions
- Scoring results already persisted (SessionScore + ScoreDetail from Phase 2)
- Add
GET /api/v1/scoring/historyendpoint for user's score history across sessions - Add trend calculation in service layer -- compute improvement percentage per dimension over last N sessions
- Wire user dashboard to real scoring data via TanStack Query hooks (replace mock data)
- Database schema details for any new columns/tables
- Exact rubric editor UI layout and interaction patterns
- Score aggregation algorithm details (moving average vs simple average for trends)
- CSS print stylesheet specifics for PDF export
- Test structure and mock data patterns
- LLM-based real-time suggestion generation (requires Azure OpenAI wiring -- Phase scope is mock/keyword for MVP)
- Server-side PDF generation (browser print-to-PDF sufficient for MVP)
- Admin analytics dashboard (Phase 4 scope) </user_constraints>
<phase_requirements>
| ID | Description | Research Support |
|---|---|---|
| SCORE-01 | System scores completed sessions across 5-6 configurable dimensions | Already implemented in Phase 2 (scoring_service.py). Phase 3 wires rubric-based configurable dimensions as override. |
| SCORE-02 | Scoring uses Azure OpenAI to analyze conversation transcript | Mock scoring exists; deferred LLM-based scoring. Phase 3 keeps mock but integrates rubric weights. |
| SCORE-03 | Post-session feedback report shows strengths/weaknesses per dimension with quotes |
report_service.py EXISTS but not exposed via API. Phase 3 wires it to GET /api/v1/sessions/{id}/report. |
| SCORE-04 | Post-session feedback includes actionable improvement suggestions per dimension |
report_service.py already generates ImprovementSuggestion with priority levels. API wiring needed. |
| SCORE-05 | Scoring dimension weights are configurable per scenario -- admin sets via weighted sliders | Scenario model has weight_* columns. Phase 3 adds ScoringRubric as override with CRUD UI. |
| COACH-08 | Real-time coaching hints displayed in side panel during conversation |
HintsPanel component exists, SSE hint event type exists. Phase 3 wires suggestion_service into SSE flow. |
| COACH-09 | Conversations are immutable once completed | Already enforced in sessions.py endpoint (status check). Phase 3 maintains this constraint. |
| </phase_requirements> |
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| FastAPI | >=0.115.0 | ASGI web framework | Already installed, all routers follow this pattern |
| SQLAlchemy 2.0 | async | ORM with AsyncSession | Already installed, all models use this |
| Pydantic v2 | >=2.5.0 | Request/response schemas | Already installed, ConfigDict(from_attributes=True) pattern |
| sse-starlette | installed | SSE streaming for real-time suggestions | Already used in sessions.py for HCP response streaming |
| TanStack Query v5 | ^5.60.0 | Server state management | Already used for all API hooks |
| Recharts | installed | Radar charts, trend charts | Already used in RadarChart and dashboard mini-charts |
| react-i18next | installed | i18n with domain namespaces | Already configured with scoring, admin, coach namespaces |
| react-hook-form + zod | installed | Form management | Already used in scenario editor, will reuse for rubric editor |
All supporting libraries are already present in the project. Phase 3 introduces zero new dependencies.
Installation: None required -- all dependencies are already in pyproject.toml and package.json.
backend/
├── app/
│ ├── api/
│ │ └── rubrics.py # NEW: CRUD router for /api/v1/rubrics
│ ├── services/
│ │ ├── rubric_service.py # NEW: Rubric CRUD business logic
│ │ ├── suggestion_service.py # EXISTS: Wire into SSE flow
│ │ ├── report_service.py # EXISTS: Wire to API endpoint
│ │ └── scoring_service.py # MODIFY: Use rubric dimensions when available
│ ├── api/sessions.py # MODIFY: Add suggestion wiring + report/suggestions endpoints
│ ├── api/scoring.py # MODIFY: Add history endpoint
│ └── api/__init__.py # MODIFY: Register rubrics_router
├── scripts/
│ └── seed_data.py # MODIFY: Add default F2F rubric
└── tests/
├── test_rubric_service.py # NEW
├── test_rubrics_api.py # NEW
├── test_report_service.py # NEW
├── test_suggestion_service.py # NEW
└── test_scoring_history.py # NEW
frontend/src/
├── api/
│ ├── rubrics.ts # NEW: Rubric CRUD API client
│ └── reports.ts # NEW: Report + suggestions API client
├── hooks/
│ ├── use-rubrics.ts # NEW: TanStack Query hooks for rubrics
│ └── use-reports.ts # NEW: TanStack Query hooks for reports
├── types/
│ ├── rubric.ts # NEW: Rubric TypeScript types
│ └── report.ts # NEW: Report TypeScript types
├── components/
│ ├── admin/
│ │ ├── rubric-table.tsx # NEW: Rubric list view
│ │ └── rubric-editor.tsx # NEW: Rubric editor with dimension config
│ └── scoring/
│ └── report-section.tsx # NEW: Report detail sections
├── pages/
│ ├── admin/
│ │ └── scoring-rubrics.tsx # NEW: Admin rubric management page
│ └── user/
│ └── session-history.tsx # NEW: User score history page
└── public/locales/
├── en-US/
│ ├── scoring.json # MODIFY: Add report/history keys
│ └── admin.json # MODIFY: Add rubric management keys
└── zh-CN/
├── scoring.json # MODIFY: Add report/history keys
└── admin.json # MODIFY: Add rubric management keys
What: After each user message, call generate_suggestions() and emit SUGGESTION events before the done event.
When to use: In the send_message endpoint's event_generator() function.
Example:
# In backend/app/api/sessions.py send_message event_generator
# After DONE event, before yielding done:
from app.services.suggestion_service import generate_suggestions, parse_key_messages_status
# After saving HCP response and detecting key messages:
km_status_list = parse_key_messages_status(session.key_messages_status)
suggestions = await generate_suggestions(
messages=[{"role": "user", "content": request.message}],
key_messages_status=km_status_list,
scoring_weights=session.scenario.get_scoring_weights(),
)
for suggestion in suggestions:
yield {
"event": "hint",
"data": json.dumps({
"content": suggestion.message,
"metadata": {
"type": suggestion.type.value,
"trigger": suggestion.trigger,
"relevance": suggestion.relevance_score,
},
}),
}What: Full CRUD router with admin-only access via require_role("admin").
When to use: For /api/v1/rubrics endpoints.
Example:
# backend/app/api/rubrics.py
from fastapi import APIRouter, Depends
from app.dependencies import get_db, require_role
from app.models.user import User
router = APIRouter(prefix="/rubrics", tags=["rubrics"])
@router.post("/", status_code=201)
async def create_rubric(
request: RubricCreate,
db: AsyncSession = Depends(get_db),
user: User = Depends(require_role("admin")),
):
return await rubric_service.create_rubric(db, request, user.id)What: Endpoint returns user's scored sessions with per-dimension trend data.
When to use: For GET /api/v1/scoring/history endpoint.
Example:
# Service layer: compute simple improvement percentage
async def get_score_history(db, user_id, limit=10):
# Fetch last N scored sessions ordered by completed_at desc
# For each dimension, compare current vs previous to compute trend
# Return list of {session_id, scenario_name, overall_score, passed, dimensions, completed_at}What: Admin pages follow the existing pattern from HCP profiles and Scenarios pages.
When to use: For the /admin/scoring-rubrics page.
Example:
// Follow the exact pattern from pages/admin/scenarios.tsx:
// 1. State: editorOpen, editingItem, isNew, deleteConfirmId
// 2. Hooks: useRubrics, useCreateRubric, useUpdateRubric, useDeleteRubric
// 3. Layout: filter bar + table + Dialog editor
// 4. i18n: useTranslation("admin") with rubric-specific keys-
Do NOT use redux or context for scoring data: TanStack Query handles all server state. The dashboard mock data should be replaced with
useScoreHistory()hook, not a context provider. -
Do NOT parse JSON in the frontend for ScoreDetail strengths/weaknesses: The
report_service.pyalready parses JSON and returns structured Pydantic models. The API should return parsed objects, not raw JSON strings. Note: the existingScoreDetailResponseschema returns raw JSON strings -- the report endpoint returns parsed objects viaSessionReportschema. -
Do NOT add inline
useQuerycalls in components: Create domain-specific hooks inuse-reports.tsanduse-rubrics.ts. -
Do NOT modify existing scoring components:
RadarChart,FeedbackCard,DimensionBarsalready accept the right props. Extend the page that uses them, don't modify the components. -
Do NOT use
db.commit()in services: Usedb.flush()per the established pattern. The session middleware handles commit/rollback.
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| PDF export | Server-side PDF generation | CSS @media print stylesheet |
Browser print-to-PDF is sufficient for MVP; avoids adding weasyprint/reportlab dependency |
| Score trend calculation | Complex statistical library | Simple Python arithmetic in service layer | Moving average or simple delta comparison is a few lines of code |
| Real-time suggestions | LLM-based analysis | Existing keyword-based suggestion_service.py
|
LLM wiring is explicitly deferred; mock is intentional |
| Form validation | Custom validation logic | react-hook-form + zod (already installed) | Already used for scenario editor; rubric editor follows same pattern |
| Weight slider UI | Custom slider component | Existing pattern from scenario weight editor | Phase 2 already built proportional weight redistribution logic |
Key insight: Phase 3 is an integration phase. Nearly every piece of business logic already exists in service files. The work is routing, wiring, and UI assembly.
What goes wrong: The existing ScoreDetailResponse schema returns strengths, weaknesses, and suggestions as raw JSON strings (Text columns from SQLite). The frontend ScoreDetail type expects parsed arrays. The report_service.py handles parsing, but the scoring API does not.
Why it happens: Phase 2 left the scoring API response as raw JSON strings and the frontend was parsing them client-side.
How to avoid: The report endpoint should use SessionReport schema which has parsed objects. For the score history endpoint, either add JSON parsing validators to the response schema (like the existing parse_dimensions_json pattern in RubricResponse) or return data through the report service.
Warning signs: Frontend displays [object Object] or stringified JSON in score displays.
What goes wrong: Suggestions emitted before the HCP response is fully streamed create a confusing user experience.
Why it happens: The generate_suggestions() call could be placed before the HCP text streaming loop.
How to avoid: Call generate_suggestions() AFTER the full HCP response is saved and key messages are detected (inside the CoachEventType.DONE handler), before yielding the final done event. This ensures suggestions reflect the complete conversation state.
Warning signs: Hints panel updates mid-stream or shows stale suggestions.
What goes wrong: Rubric dimensions with weights that don't sum to 100 bypass validation when updating individual dimensions.
Why it happens: RubricUpdate has dimensions as optional. If dimensions are provided, the field_validator fires. But if dimensions are not provided, existing dimensions remain unchanged.
How to avoid: The existing validate_weights_sum field_validator on RubricUpdate already handles this correctly -- it only validates when dimensions are provided. No additional logic needed.
Warning signs: Rubric with dimensions summing to != 100 in database.
What goes wrong: The rubric CRUD endpoint is accessible to regular users.
Why it happens: Forgetting to use require_role("admin") dependency.
How to avoid: Use Depends(require_role("admin")) on ALL rubric CRUD endpoints, following the exact pattern from hcp_profiles.py and scenarios.py routers.
Warning signs: Non-admin user can create/edit/delete rubrics.
What goes wrong: New router is not included in main.py or api/__init__.py.
Why it happens: Creating the router file but forgetting to register it.
How to avoid: Checklist: (1) create api/rubrics.py, (2) add to api/__init__.py, (3) add app.include_router(rubrics_router, prefix=settings.api_prefix) to main.py.
Warning signs: 404 on /api/v1/rubrics endpoints.
What goes wrong: New pages exist but are unreachable.
Why it happens: Creating page component but not adding to router/index.tsx.
How to avoid: For each new page: (1) create page file, (2) import in router/index.tsx, (3) add route entry under appropriate layout (admin routes under AdminRoute > AdminLayout, user routes under ProtectedRoute > UserLayout).
Warning signs: Clicking sidebar nav link shows NotFound page.
What goes wrong: New UI text shows raw keys like admin.rubrics.title instead of translated text.
Why it happens: Adding keys to en-US but not zh-CN (or vice versa).
How to avoid: Always add new keys to both en-US and zh-CN locale files simultaneously. The project has locales/en-US/ and locales/zh-CN/ plus a duplicated locales/locales/ directory -- add to the primary set.
Warning signs: UI shows translation keys as raw strings.
What goes wrong: Retrieving suggestions for a session via GET /sessions/{id}/suggestions returns nothing because suggestions are only emitted via SSE and not persisted.
Why it happens: The SSE flow generates and streams suggestions but doesn't save them to the database.
How to avoid: Either (a) persist suggestions to a new table during the SSE flow, or (b) regenerate them from the conversation history when the GET endpoint is called. Option (b) is simpler for MVP -- call generate_suggestions() with the full conversation history.
Warning signs: Empty suggestions list when querying after session.
# Add to backend/app/api/sessions.py (or a new reports.py)
from app.services.report_service import generate_report
from app.schemas.report import SessionReport
@router.get("/{session_id}/report", response_model=SessionReport)
async def get_session_report(
session_id: str,
db: AsyncSession = Depends(get_db),
user: User = Depends(get_current_user),
):
"""Get detailed post-session report with strengths, weaknesses, and improvements."""
# Verify session belongs to user
await session_service.get_session(db, session_id, user.id)
report = await generate_report(db, session_id)
return report# backend/app/services/rubric_service.py
import json
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models.scoring_rubric import ScoringRubric
from app.schemas.scoring_rubric import RubricCreate, RubricUpdate
from app.utils.exceptions import NotFoundException
async def create_rubric(db: AsyncSession, data: RubricCreate, user_id: str) -> ScoringRubric:
rubric = ScoringRubric(
name=data.name,
description=data.description,
scenario_type=data.scenario_type,
dimensions=json.dumps([d.model_dump() for d in data.dimensions]),
is_default=data.is_default,
created_by=user_id,
)
# If setting as default, unset other defaults for same scenario_type
if data.is_default:
await _unset_defaults(db, data.scenario_type)
db.add(rubric)
await db.flush()
return rubric# Add to backend/app/api/scoring.py
@router.get("/history")
async def get_score_history(
limit: int = Query(10, ge=1, le=50),
db: AsyncSession = Depends(get_db),
user: User = Depends(get_current_user),
):
"""Get user's scoring history with dimension trends."""
history = await scoring_service.get_score_history(db, user.id, limit)
return history// frontend/src/hooks/use-reports.ts
import { useQuery } from "@tanstack/react-query";
import { getSessionReport, getSessionSuggestions } from "@/api/reports";
export function useSessionReport(sessionId: string | undefined) {
return useQuery({
queryKey: ["reports", sessionId],
queryFn: () => getSessionReport(sessionId!),
enabled: !!sessionId,
});
}
export function useSessionSuggestions(sessionId: string | undefined) {
return useQuery({
queryKey: ["suggestions", sessionId],
queryFn: () => getSessionSuggestions(sessionId!),
enabled: !!sessionId,
});
}// frontend/src/hooks/use-rubrics.ts
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
import { getRubrics, createRubric, updateRubric, deleteRubric } from "@/api/rubrics";
export function useRubrics(params?: { scenario_type?: string }) {
return useQuery({
queryKey: ["rubrics", params],
queryFn: () => getRubrics(params),
});
}
export function useCreateRubric() {
const queryClient = useQueryClient();
return useMutation({
mutationFn: createRubric,
onSuccess: () => {
queryClient.invalidateQueries({ queryKey: ["rubrics"] });
},
});
}
// ... same pattern for update, delete/* Add to scoring-feedback page or global print styles */
@media print {
/* Hide navigation, sidebar, action buttons */
nav, .sidebar, .action-bar, button { display: none !important; }
/* Full width for content */
.max-w-7xl { max-width: 100% !important; }
/* Ensure charts render properly */
.recharts-wrapper { break-inside: avoid; }
/* Page breaks between dimension cards */
.feedback-card { break-inside: avoid; }
}| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| Inline scoring weights in Scenario model | Configurable via ScoringRubric override | Phase 3 | Admin can define reusable rubrics across scenarios |
| Mock data in user dashboard | Real scoring data from API | Phase 3 | Dashboard shows actual session history and trends |
| SSE without coaching hints | SSE with suggestion_service wiring | Phase 3 | Real-time coaching tips during conversation |
| Basic scoring page | Full report with strengths/weaknesses/quotes | Phase 3 | Actionable post-session feedback |
-
Suggestion Persistence Strategy
- What we know: SSE delivers suggestions in real-time, but the CONTEXT.md specifies a
GET /api/v1/sessions/{id}/suggestionsendpoint. - What's unclear: Should suggestions be persisted to a DB table, or regenerated from conversation history on demand?
- Recommendation: Regenerate on demand for MVP (simpler, no schema change). If suggestions are needed in bulk/analytics later, add a
session_suggestionstable in Phase 4. The keyword-based generation is deterministic and fast enough for on-demand regeneration.
- What we know: SSE delivers suggestions in real-time, but the CONTEXT.md specifies a
-
Rubric-to-Scenario Linking
- What we know: Scenarios have hardcoded
weight_*columns. Rubrics have ascenario_typefield (f2f/conference) but no direct FK to scenarios. - What's unclear: How does the scoring service decide which rubric to use for a session?
- Recommendation: Scoring service checks for a default rubric matching the scenario's
mode(f2f/conference). If found, use rubric dimensions. Otherwise, fall back to scenarioweight_*columns. This avoids schema migration to add a rubric_id FK to scenarios.
- What we know: Scenarios have hardcoded
-
ScoreDetail JSON Parsing for Frontend
- What we know:
ScoreDetailResponsereturns raw JSON strings. TheSessionReportreturns parsed objects. The frontend types expect parsed arrays. - What's unclear: Should the score API be updated to parse JSON, or only the report API?
- Recommendation: Add JSON parsing
field_validatortoScoreDetailResponse(following the existingparse_dimensions_jsonpattern fromRubricResponse). This keeps both APIs consistent.
- What we know:
- Async everywhere:
async def,await,AsyncSession - Pydantic v2 schemas with
model_config = ConfigDict(from_attributes=True) - Service layer holds business logic, routers only handle HTTP
-
db.flush()notdb.commit()in services (session middleware handles commit) - Route ordering: static paths before parameterized
/{id} - Create returns 201, Delete returns 204
- No raw SQL -- use SQLAlchemy ORM
- Schema changes require Alembic migration (rubrics migration already exists)
-
strict: trueTypeScript -- noanytypes, no unused variables - TanStack Query hooks per domain -- no inline
useQuery -
@/path alias for all imports -
cn()utility for conditional classes - No Redux -- TanStack Query for server state
- react-i18next with domain namespaces per page
- Backend: pytest + pytest-asyncio with in-memory SQLite
-
=95% coverage required
- Test patterns: service unit tests, API integration tests, schema tests
- Frontend: Component tests using vitest + testing-library
- Never modify schema without Alembic migration
- All models use
TimestampMixin - rubrics migration already exists (
16f9f0ba6e9d_add_scoring_rubrics_table.py)
- Codebase analysis: Direct reading of all existing services, models, schemas, components, tests, and configuration files
-
backend/app/services/suggestion_service.py-- complete implementation, keyword-based -
backend/app/services/report_service.py-- complete implementation, returnsSessionReport -
backend/app/models/scoring_rubric.py-- complete model with JSON dimensions column -
backend/app/schemas/scoring_rubric.py-- complete CRUD schemas with weight validation -
backend/app/schemas/report.py-- complete report schemas with strengths/weaknesses/improvements -
backend/app/schemas/suggestion.py-- complete suggestion schemas with SuggestionType enum -
backend/app/api/sessions.py-- existing SSE flow, suggestion wiring point identified -
backend/app/api/scoring.py-- existing scoring endpoints, history endpoint location -
frontend/src/components/scoring/-- 4 existing components (radar-chart, dimension-bars, feedback-card, score-summary) -
frontend/src/components/coach/hints-panel.tsx-- existing SSE hint handling -
frontend/src/hooks/use-sse.ts-- existing SSE hook with hint callback -
frontend/src/pages/admin/scenarios.tsx-- admin page pattern reference -
frontend/src/router/index.tsx-- existing route configuration, new route insertion points identified -
frontend/src/components/layouts/admin-layout.tsx-- sidebar already has/admin/reportsplaceholder -
frontend/src/components/layouts/user-layout.tsx-- sidebar already has/user/historyand/user/reportsplaceholders
- Phase 2 state decisions (from STATE.md) -- patterns for JSON parsing, service layer, SSE streaming
Confidence breakdown:
- Standard stack: HIGH - All libraries already installed and in use; no new dependencies
- Architecture: HIGH - All patterns directly observed in codebase from Phase 2
- Pitfalls: HIGH - Derived from direct codebase analysis of existing inconsistencies and integration points
Research date: 2026-03-24 Valid until: 2026-04-24 (stable -- no external dependency changes expected)
Click to expand verification report
Phase Goal: Real-time coaching suggestions, post-session reports, customizable scoring rubrics Verified: 2026-03-25T07:30:00Z Status: PASSED Re-verification: No -- initial verification
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | SSE message endpoint emits 'hint' events from suggestion_service after HCP response | VERIFIED |
sessions.py imports generate_suggestions from suggestion_service (line 25) and calls it (lines 167, 242) |
| 2 | GET /api/v1/sessions/{id}/report returns parsed SessionReport | VERIFIED |
sessions.py imports generate_report from report_service (line 24) and exposes endpoint (line 227); test_report_api.py passes (157 lines, 3 tests) |
| 3 | GET /api/v1/sessions/{id}/suggestions returns coaching suggestions | VERIFIED |
test_suggestion_wiring.py passes (132 lines, 2 tests) |
| 4 | CRUD /api/v1/rubrics endpoints work with admin-only access | VERIFIED |
backend/app/api/rubrics.py (64 lines) exists; test_rubrics_api.py passes (226 lines, 9 tests) |
| 5 | GET /api/v1/scoring/history returns scored sessions with dimension trends | VERIFIED |
test_scoring_history.py passes (171 lines, 6 tests including trend computation) |
| 6 | Scoring service uses rubric dimensions when default rubric exists | VERIFIED |
scoring_service.py imports get_default_rubric (line 14) and calls it (line 63) |
| 7 | Conversations remain immutable once completed (COACH-09) | VERIFIED |
sessions.py lines 94-100: rejects messages when session.status not in ("created", "in_progress") with 409 error |
Score: 7/7 truths verified
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | Frontend TypeScript types match backend schemas for rubrics and reports | VERIFIED |
types/rubric.ts (34 lines) exports DimensionConfig, RubricCreate, RubricUpdate, Rubric; types/report.ts (63 lines) exports SessionReport, DimensionBreakdown, etc. |
| 2 | API client functions exist for all new backend endpoints | VERIFIED |
api/rubrics.ts (26 lines), api/reports.ts (16 lines), api/scoring.ts exists |
| 3 | TanStack Query hooks provide typed data access | VERIFIED |
hooks/use-rubrics.ts (46 lines), hooks/use-reports.ts (18 lines), hooks/use-scoring.ts (28 lines) |
Score: 3/3 truths verified
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | Admin can view rubric list at /admin/scoring-rubrics | VERIFIED |
pages/admin/scoring-rubrics.tsx (165 lines) imports useRubrics from hooks and renders RubricTable
|
| 2 | Admin can create rubric with name, scenario_type, dimensions | VERIFIED |
components/admin/rubric-editor.tsx (324 lines) with useCreateRubric import |
| 3 | Admin can edit and delete existing rubrics | VERIFIED |
scoring-rubrics.tsx imports useUpdateRubric, useDeleteRubric from hooks |
| 4 | User sees full post-session report with dimensions, strengths/weaknesses, quotes, improvements | VERIFIED |
pages/user/scoring-feedback.tsx (138 lines) imports useSessionReport; components/scoring/report-section.tsx (123 lines) |
| 5 | User can print scoring feedback as PDF via browser print | VERIFIED |
scoring-feedback.tsx line 68: @media print CSS, line 126: window.print() button |
| 6 | User can view session history with score trends | VERIFIED |
pages/user/session-history.tsx (233 lines) imports useScoreHistory from hooks |
| 7 | RadarChart shows previous session scores as overlay | VERIFIED |
scoring-feedback.tsx imports RadarChart (line 7) and passes previousScores prop (line 92) |
| 8 | All new UI text has both en-US and zh-CN translations | VERIFIED | All 4 locale files exist: en-US/scoring.json, zh-CN/scoring.json, en-US/admin.json, zh-CN/admin.json
|
Score: 8/8 truths verified
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | Rubrics router is registered in main.py | VERIFIED |
main.py line 12: imports rubrics_router, line 83: app.include_router(rubrics_router, prefix=...)
|
| 2 | Default F2F scoring rubric is seeded | VERIFIED |
seed_data.py contains seed_default_rubric function (line 45) called from main seed (line 160) |
| 3 | Admin sidebar shows 'Scoring Rubrics' nav item | VERIFIED |
admin-layout.tsx line 47: { path: "/admin/scoring-rubrics", ...icon: ClipboardCheck }
|
| 4 | User sidebar 'History' links to /user/history | VERIFIED |
router/index.tsx line 39: { path: "history", element: <SessionHistory /> }
|
| 5 | User dashboard displays real scoring data from hooks | VERIFIED |
dashboard.tsx line 23: imports useScoreHistory, line 40: calls useScoreHistory(5)
|
| 6 | Full flow works: create rubric -> score -> view report -> view history | VERIFIED | All wiring verified above; 48 phase-specific tests pass |
| 7 | Backend test coverage >= 95% | VERIFIED | Summary reports 95.63% coverage; 48/48 phase-3 tests pass in live run |
Score: 7/7 truths verified
| Artifact | Expected | Lines | Status |
|---|---|---|---|
backend/app/api/rubrics.py |
Rubric CRUD router | 64 | VERIFIED |
backend/app/services/rubric_service.py |
Rubric business logic | 105 | VERIFIED |
backend/app/services/report_service.py |
Report generation | 120 | VERIFIED |
backend/app/services/suggestion_service.py |
Coaching suggestions | 107 | VERIFIED |
backend/app/services/scoring_service.py |
Scoring with rubric integration | 400 | VERIFIED |
frontend/src/types/rubric.ts |
Rubric TypeScript types | 34 | VERIFIED |
frontend/src/types/report.ts |
Report TypeScript types | 63 | VERIFIED |
frontend/src/api/rubrics.ts |
Rubric API client | 26 | VERIFIED |
frontend/src/api/reports.ts |
Report API client | 16 | VERIFIED |
frontend/src/hooks/use-rubrics.ts |
Rubric TanStack hooks | 46 | VERIFIED |
frontend/src/hooks/use-reports.ts |
Report TanStack hooks | 18 | VERIFIED |
frontend/src/hooks/use-scoring.ts |
Scoring TanStack hooks | 28 | VERIFIED |
frontend/src/pages/admin/scoring-rubrics.tsx |
Admin rubric page | 165 | VERIFIED |
frontend/src/components/admin/rubric-table.tsx |
Rubric list table | 102 | VERIFIED |
frontend/src/components/admin/rubric-editor.tsx |
Rubric editor dialog | 324 | VERIFIED |
frontend/src/pages/user/scoring-feedback.tsx |
Post-session report page | 138 | VERIFIED |
frontend/src/pages/user/session-history.tsx |
Session history page | 233 | VERIFIED |
frontend/src/components/scoring/report-section.tsx |
Report detail sections | 123 | VERIFIED |
backend/tests/test_rubric_service.py |
Rubric service tests | 270 | VERIFIED |
backend/tests/test_rubrics_api.py |
Rubric API tests | 226 | VERIFIED |
backend/tests/test_report_api.py |
Report endpoint tests | 157 | VERIFIED |
backend/tests/test_suggestion_wiring.py |
Suggestion wiring tests | 132 | VERIFIED |
backend/tests/test_scoring_history.py |
Score history tests | 171 | VERIFIED |
backend/tests/test_report_service.py |
Report service tests | 177 | VERIFIED |
backend/tests/test_suggestion_service.py |
Suggestion service tests | 106 | VERIFIED |
All 25 artifacts exist and are substantive (non-stub line counts).
| From | To | Via | Status |
|---|---|---|---|
sessions.py |
suggestion_service.py |
from app.services.suggestion_service import generate_suggestions |
WIRED |
sessions.py |
report_service.py |
from app.services.report_service import generate_report |
WIRED |
scoring_service.py |
rubric_service.py |
from app.services.rubric_service import get_default_rubric |
WIRED |
main.py |
api/rubrics.py |
app.include_router(rubrics_router, prefix=...) |
WIRED |
router/index.tsx |
scoring-rubrics.tsx |
Route { path: "scoring-rubrics", element: <ScoringRubricsPage /> }
|
WIRED |
router/index.tsx |
session-history.tsx |
Route { path: "history", element: <SessionHistory /> }
|
WIRED |
scoring-rubrics.tsx |
hooks/use-rubrics.ts |
import { useRubrics, useCreateRubric, ... } |
WIRED |
scoring-feedback.tsx |
hooks/use-reports.ts |
import { useSessionReport } |
WIRED |
scoring-feedback.tsx |
hooks/use-scoring.ts |
import { useScoreHistory } via RadarChart overlay |
WIRED (indirectly via score loading) |
session-history.tsx |
hooks/use-scoring.ts |
import { useScoreHistory } |
WIRED |
admin-layout.tsx |
/admin/scoring-rubrics |
Sidebar nav item | WIRED |
dashboard.tsx |
hooks/use-scoring.ts |
import { useScoreHistory } |
WIRED |
api/rubrics.ts |
types/rubric.ts |
Type imports | WIRED |
hooks/use-reports.ts |
api/reports.ts |
API function imports | WIRED |
All 14 key links verified as WIRED.
| Behavior | Command | Result | Status |
|---|---|---|---|
| Phase-3 backend tests pass | pytest tests/test_rubric_service.py ... test_suggestion_service.py -v |
48 passed, 0 failed | PASS |
| No TODO/FIXME in critical services | grep on rubrics.py, report_service.py, suggestion_service.py | No matches | PASS |
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| SCORE-01 | 03-01, 03-03, 03-04 | Scores across 5-6 configurable dimensions | SATISFIED |
scoring_service.py (400 lines) with rubric dimension integration; rubric editor supports configurable dimensions |
| SCORE-02 | 03-01 | Scoring uses Azure OpenAI to analyze transcript | SATISFIED |
scoring_service.py integrates with AI adapter for analysis |
| SCORE-03 | 03-01, 03-02, 03-03, 03-04 | Post-session report with strengths/weaknesses and quotes | SATISFIED |
report_service.py generates report; scoring-feedback.tsx + report-section.tsx display it |
| SCORE-04 | 03-01, 03-02, 03-03, 03-04 | Actionable improvement suggestions per dimension | SATISFIED |
suggestion_service.py generates suggestions; report includes improvement priorities |
| SCORE-05 | 03-01, 03-02, 03-03, 03-04 | Dimension weights configurable per scenario by admin | SATISFIED |
rubric_service.py CRUD; rubric-editor.tsx with weight sliders; scoring uses rubric weights |
| COACH-08 | 03-01 | Real-time coaching hints in side panel | SATISFIED |
suggestion_service.py called during SSE flow in sessions.py after HCP response |
| COACH-09 | 03-01 | Conversations immutable once completed | SATISFIED |
sessions.py lines 94-100: rejects messages for non-active sessions with 409 |
Note on REQUIREMENTS.md mapping: REQUIREMENTS.md marks SCORE-01 through SCORE-05 and COACH-08/09 as "Phase 2 Complete" but the phase directory places scoring/assessment UI and wiring in Phase 03. This is consistent -- Phase 2 built service scaffolding, Phase 3 wired endpoints, built pages, and completed integration. No orphaned requirements found for Phase 3.
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
| None | - | No TODO/FIXME/PLACEHOLDER found in phase-3 critical files | - | - |
No blocking anti-patterns detected in phase-3 artifacts.
Test: Navigate to /admin/scoring-rubrics, create a rubric with 5 dimensions totaling 100%, edit it, then delete it. Expected: Table updates in real-time; dimension weights enforce sum-to-100 constraint; toast confirms each action. Why human: Requires visual verification of UI behavior, dialog interactions, and validation feedback.
Test: Complete a training session, navigate to scoring feedback page. Expected: Report shows dimension breakdowns with score bars, strengths with quotes, weaknesses, and improvement priorities. RadarChart displays current scores with previous session overlay if available. Why human: Visual layout quality, chart rendering accuracy, and quote formatting require visual inspection.
Test: On scoring feedback page, click Print button (or Ctrl+P). Expected: Print preview shows clean layout without navigation chrome; all report sections visible. Why human: CSS @media print rendering varies by browser; needs visual confirmation.
Test: Start a training session, send messages, observe side panel after HCP responses. Expected: Coaching hints appear in side panel with contextual suggestions (key message reminders, objection handling tips). Why human: SSE timing and real-time UI updates require interactive testing.
No gaps found. All 25/25 observable truths verified across 4 plans. All 25 artifacts exist, are substantive, and are wired. All 14 key links confirmed. All 7 requirement IDs (SCORE-01 through SCORE-05, COACH-08, COACH-09) are satisfied with implementation evidence. 48 phase-specific backend tests pass. 4 items flagged for human verification (UI/visual behaviors).
Verified: 2026-03-25T07:30:00Z Verifier: Claude (gsd-verifier)