Planning Phase 21 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 21: Scoring Criteria Refactor

Auto-generated from .planning/phases/21-scoring-criteria-refactor
Last synced: 2026-04-28

Context & Decisions

Phase 21: Scoring Criteria Refactor — 评分标准模块重构,动态维度驱动 - Context

Gathered: 2026-04-27 Status: Ready for planning

## Phase Boundary

重构 MR 会话评分(Session Scoring)的硬编码 5 维度系统,使 ScoringRubric 成为评分的唯一权威来源(Single Source of Truth)。管理员可自由定义评分维度名称/数量/权重/评分标准,所有评分流程(LLM 评分、Mock 评分、前端展示)统一从 Rubric 动态读取。

不在范围内: Dry Run Scoring(SOP 覆盖度评分)和 Skill Quality Evaluation(Skill 内容质量评分)保持不变——它们评估的是完全不同的东西。

## Implementation Decisions

D-01: 重构范围 — 仅 Session Scoring

  • 只重构 MR 会话评分的 5 个硬编码维度 → Rubric 动态
  • Dry Run Scoring(dry_run_engine.py)和 Skill Quality Evaluation(skill_evaluation_service.py)不受影响
  • Skill 的 ## Assessment Rubric 自由文本注入 LLM prompt 的机制保留不变

D-02: Skill 叠加模式 — 自定义评分标准在 Rubric 中添加

  • 管理员可以在 ScoringRubric 中基于默认维度添加自定义维度(如"临床数据引用准确性")
  • 不同 Skill/Scenario 可以关联不同的 Rubric,实现不同 Skill 有不同评分标准
  • Skill markdown 中的 ## Assessment Rubric 继续作为 LLM 的额外评分指导文本

D-03: 维度完全自由

  • 管理员可以自由添加/删除/编辑 Rubric 中的任何维度,包括默认的 5 个
  • 无锁定维度,最大灵活性
  • Rubric 的 dimensions JSON 格式保持现有:[{name, weight, criteria[], max_score}]
  • 权重总和必须等于 100(现有 schema 校验已支持)

D-04: 数据迁移 — 删除旧列

  • Alembic 迁移将现有 Scenario 的 5 个 weight_* 列数据转换为 Rubric 记录
  • 为每个现有 Scenario 自动创建对应的 Rubric(保留原有权重配置)
  • 添加 rubric_id FK 到 Scenario 模型
  • 迁移后删除旧的 weight_key_messageweight_objection_handlingweight_communicationweight_product_knowledgeweight_scientific_info
  • 删除 get_scoring_weights() 方法

D-05: 强制关联 Rubric

  • 每个 Scenario 必须关联一个 Rubric(rubric_id NOT NULL)
  • 迁移时自动为现有 Scenario 创建 Rubric 并关联
  • 新建 Scenario 时必须选择或创建 Rubric
  • 不再需要 get_default_rubric() 回退机制(但可保留为新建 Scenario 时的默认推荐)

D-06: 评分 Prompt 完全动态化

  • 删除 scoring_engine.py 中的 dim_names 硬编码字典
  • 删除 Instructions 中对 5 个具体维度的描述(key_message、objection_handling 等)
  • 维度名称、权重、评分指南全部从 Rubric 的 dimensions JSON 动态生成
  • Rubric 的 criteria[] 字段直接注入为每个维度的评分指导

D-07: 前端 ScoringWeights 组件完全动态化

  • 删除 WEIGHT_KEYS 硬编码数组和 I18N_KEYS 映射
  • ScoringWeights 调用 Rubric API 获取维度列表,动态生成滑块
  • Scenario Editor 中选择 Rubric 后显示对应维度的权重配置
  • 现有 Rubric 管理页 /admin/scoring-rubrics 继续作为独立的 Rubric CRUD 入口

D-08: Mock 评分生成器动态化

  • _generate_mock_scores() 不再硬编码 5 个维度块
  • 从 Rubric 的 dimensions 动态生成任意数量的维度评分
  • 每个维度的 mock 分数、strengths、weaknesses、suggestions 基于通用模板动态生成

Claude's Discretion

  • Alembic migration 的具体实现细节(batch mode for SQLite etc.)
  • Mock 评分生成器中通用 strengths/weaknesses 文案模板设计
  • Rubric 选择 UI 在 Scenario Editor 中的具体交互设计(下拉框 vs 弹窗)
  • 前端 i18n 处理(自定义维度名称是否需要 i18n)
  • 测试结构和 mock 数据模式

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

评分引擎(Session Scoring)

  • backend/app/services/scoring_engine.py — LLM 评分 prompt 构建,dim_names 硬编码位置
  • backend/app/services/scoring_service.py — 评分编排,mock 生成器,rubric 权重解析逻辑
  • backend/app/services/rubric_service.py — Rubric CRUD,get_default_rubric()

数据模型

  • backend/app/models/scenario.py — Scenario 模型,5个 weight_* 列,get_scoring_weights()
  • backend/app/models/scoring_rubric.py — ScoringRubric 模型,JSON dimensions
  • backend/app/schemas/scoring_rubric.py — Rubric Pydantic schemas,权重校验
  • backend/app/schemas/scenario.py — Scenario schemas(需要移除 weight 字段,添加 rubric_id)

前端评分组件

  • frontend/src/components/admin/scoring-weights.tsx — 硬编码 WEIGHT_KEYS,需重构为动态
  • frontend/src/components/scoring/radar-chart.tsx — 已动态(接收 ScorePoint[])
  • frontend/src/components/scoring/dimension-bars.tsx — 已动态(接收 ScoreDetail[])
  • frontend/src/components/scoring/feedback-card.tsx — 评分反馈卡片
  • frontend/src/pages/admin/scoring-rubrics.tsx — Rubric 管理页

Skill 评分集成

  • backend/app/services/scoring_service.py:235_extract_skill_criteria() 函数(保持不变)

迁移

  • backend/alembic/versions/16f9f0ba6e9d_add_scoring_rubrics_table.py — 现有 Rubric 迁移

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • ScoringRubric model: 已存在,dimensions JSON 格式 [{name, weight, criteria[], max_score}] 已支持动态维度
  • rubric_service.py: CRUD 已完整(create/get/list/update/delete)
  • RadarChart + DimensionBars: 已接收动态数据(ScorePoint[] / ScoreDetail[]),无需大改
  • /admin/scoring-rubrics 页面: Rubric 管理 UI 已存在
  • Rubric schema 校验: DimensionConfig 已校验权重总和 = 100

Established Patterns

  • Service layer: 业务逻辑在 services/*.py,router 只做 HTTP
  • Pydantic v2: ConfigDict(from_attributes=True),field validators
  • Alembic: batch operations for SQLite (Gotcha #1)
  • TanStack Query hooks per domain

Integration Points

  • scoring_service.py:69-77 — 当前 rubric vs scenario weights 解析逻辑,需改为强制 rubric
  • scoring_engine.py:103-113dim_names 字典和 dimensions_config 生成,需改为动态
  • scoring_service.py:298-440 — Mock 评分的 5 个硬编码维度块
  • scenario.py:52-60get_scoring_weights() 和 5 个 weight_* 列
  • Scenario Editor (frontend/src/components/admin/scenario-editor.tsx) — 需要添加 Rubric 选择器

</code_context>

## Specific Ideas
  • 不同 Skill 可以关联不同的 Rubric,实现"同一场景用不同 Skill 时评分标准不同"的需求
  • HCP 对评分的影响仅通过 LLM prompt 上下文,无需数值修改器
  • 4 种评分场景完全独立:Session Scoring(本次重构)、Dry Run(SOP 覆盖度)、Skill Criteria Injection(文本注入)、Skill Quality Eval(内容质量)
## Deferred Ideas

None — discussion stayed within phase scope


Phase: 21-scoring-criteria-refactor Context gathered: 2026-04-27

Plans (3)

# Plan File Status
21-01 21-01-PLAN.md Complete
21-02 21-02-PLAN.md Complete
21-03 21-03-PLAN.md Complete

Research

Click to expand research notes

Phase 21: Scoring Criteria Refactor - Research

Researched: 2026-04-27 Domain: Scoring system refactoring -- eliminate hardcoded dimensions, make ScoringRubric the SSOT Confidence: HIGH

Summary

This phase is a refactoring of an existing, working scoring system. The core problem is that 5 scoring dimensions (key_message, objection_handling, communication, product_knowledge, scientific_info) are hardcoded in 7 locations across the codebase: the Scenario ORM model (5 weight columns), the scoring engine prompt template (dimension-specific instructions), the mock score generator (5 hardcoded dimension blocks), the frontend ScoringWeights component (typed to exactly 5 keys), the frontend Scenario TypeScript types, the analytics recommendation service (dimension-to-column mapping), and the i18n locale files (hardcoded dimension translations).

A ScoringRubric model already exists with a JSON dimensions field that supports arbitrary dimension names, weights, and criteria. The rubric editor UI already supports dynamic dimensions. The refactoring goal is to make this rubric the single source of truth so all scoring flows read dimensions from the rubric rather than from hardcoded scenario columns. This is a structural cleanup, not a feature addition -- the user-facing behavior (multi-dimensional scoring with configurable weights) remains the same, but becomes truly configurable.

Primary recommendation: Add a rubric_id FK to the Scenario model, remove the 5 weight_* columns via Alembic migration with data migration (converting existing weight values to rubric records), then refactor all downstream consumers (scoring engine, mock generator, frontend components) to read dimensions from the rubric. The two separate scoring systems (session scoring and Skill quality scoring) must remain independent -- they have different dimensions and different purposes.

Standard Stack

Core (existing -- no new dependencies)

Library Version Purpose Why Standard
SQLAlchemy 2.0 existing ORM with async sessions Project standard [VERIFIED: codebase]
Alembic existing Schema migrations with batch mode for SQLite Project standard [VERIFIED: codebase]
Pydantic v2 existing Request/response schemas with validators Project standard [VERIFIED: codebase]
FastAPI existing API layer with dependency injection Project standard [VERIFIED: codebase]
react-hook-form + zod existing Frontend form validation Project standard [VERIFIED: codebase]
recharts existing RadarChart, charts for scoring visualization Project standard [VERIFIED: codebase]

No New Dependencies Required

This refactoring uses exclusively existing libraries. No new packages need to be installed. [VERIFIED: codebase audit]

Architecture Patterns

Current Architecture (Before Refactor)

Scenario model
  ├── weight_key_message: int = 30
  ├── weight_objection_handling: int = 25
  ├── weight_communication: int = 20
  ├── weight_product_knowledge: int = 15
  ├── weight_scientific_info: int = 10
  └── get_scoring_weights() -> dict  # Returns hardcoded 5-key dict

ScoringRubric model (exists but underused)
  └── dimensions: JSON  # [{name, weight, criteria[], max_score}]

scoring_service.py
  ├── Reads scenario.get_scoring_weights() as fallback
  └── Reads rubric dimensions only if a default rubric exists

scoring_engine.py
  ├── SCORING_PROMPT_TEMPLATE: hardcoded dimension instructions
  └── dim_names dict: maps 5 keys to display names

Target Architecture (After Refactor)

Scenario model
  ├── rubric_id: FK -> scoring_rubrics.id (NOT NULL per D-05)
  ├── pass_threshold: int = 70
  └── (weight_* columns REMOVED)

ScoringRubric model (SSOT)
  └── dimensions: JSON  # [{name, weight, criteria[], max_score}]

scoring_service.py
  ├── Resolves rubric: always via scenario.rubric_id (NOT NULL, no fallback needed)
  └── Passes rubric dimensions to scoring engine

scoring_engine.py
  ├── Builds prompt dynamically from rubric dimensions
  └── No hardcoded dimension names or instructions

Frontend
  ├── ScenarioEditor: rubric selector (required field) instead of ScoringWeights
  └── All scoring components: read dimensions from score.details (already dynamic)

Pattern 1: Direct Rubric Lookup (No Fallback Chain)

What: Direct rubric lookup via scenario.rubric_id (NOT NULL per D-05) When to use: Every time a session needs to be scored Example:

# Source: [codebase pattern from rubric_service.py, simplified per D-05]
async def resolve_rubric_dimensions(db: AsyncSession, scenario) -> list[dict]:
    """Resolve rubric dimensions for scoring.
    
    Per D-05: rubric_id is NOT NULL, so direct lookup always succeeds.
    get_default_rubric() fallback is no longer needed for scoring.
    """
    import json as _json
    from app.services.rubric_service import get_rubric
    
    rubric = await get_rubric(db, scenario.rubric_id)
    dims = rubric.dimensions
    return _json.loads(dims) if isinstance(dims, str) else dims

Pattern 2: Dynamic Prompt Building

What: Build scoring prompt from rubric dimensions, not hardcoded names When to use: LLM scoring engine Example:

# Replace hardcoded dim_names dict with rubric-driven config
def build_dimensions_config(rubric_dimensions: list[dict]) -> str:
    lines = []
    for dim in rubric_dimensions:
        name = dim["name"]
        weight = dim["weight"]
        criteria = dim.get("criteria", [])
        criteria_text = "; ".join(criteria) if criteria else "General assessment"
        lines.append(f"- {name}: weight={weight}%, criteria: {criteria_text}")
    return "\n".join(lines)

Pattern 3: Dynamic Mock Score Generation

What: Generate mock scores for arbitrary dimension sets When to use: Mock scoring fallback when LLM unavailable Example:

def _generate_mock_scores(
    rubric_dimensions: list[dict],
    scenario: Scenario,
    messages: list,
    key_messages_status: list[dict],
) -> dict:
    """Generate mock scores for N arbitrary dimensions."""
    dimensions = []
    for dim_config in rubric_dimensions:
        score = min(95, max(60, base_score + random.randint(-8, 10)))
        dimensions.append({
            "dimension": dim_config["name"],
            "score": score,
            "weight": dim_config["weight"],
            "strengths": [...],
            "weaknesses": [...],
            "suggestions": [...],
        })
    # Calculate weighted overall
    overall = sum(d["score"] * d["weight"] / 100 for d in dimensions)
    ...

Anti-Patterns to Avoid

  • Merging session scoring with Skill quality scoring: These are two separate systems with different dimensions (5 MR-facing vs 6 content-quality). They must remain independent. [VERIFIED: codebase -- Skill scoring uses sop_completeness, knowledge_accuracy, etc.]
  • Merging with DryRun scoring: DryRun uses executability_score and coverage_percent, which are completely different metrics. Do not touch DryRun scoring. [VERIFIED: codebase]
  • Breaking backward compatibility on stored data: Existing ScoreDetail rows reference dimension names like key_message. These must remain readable even after the refactoring. New sessions will use rubric-defined names.
  • Removing ScoringWeights component entirely: Deprecate but keep the file until all references are migrated. The rubric editor already handles dynamic dimensions.

Hardcoded Dimension Locations (Complete Inventory)

# File What's Hardcoded Action
1 backend/app/models/scenario.py 5 weight_* columns + get_scoring_weights() method Remove columns, add rubric_id FK
2 backend/app/schemas/scenario.py 5 weight_* fields in Create/Update/Response + validate_weights_sum Remove weight fields, add rubric_id
3 backend/app/services/scoring_engine.py dim_names dict mapping 5 keys to labels, per-dimension instructions in SCORING_PROMPT_TEMPLATE Build dynamically from rubric
4 backend/app/services/scoring_service.py _generate_mock_scores() with 5 hardcoded dimension blocks Rewrite as loop over rubric dimensions
5 backend/app/services/analytics_service.py weight_map dict in get_recommended_scenarios() mapping dimension to Scenario.weight_* columns Rewrite to query via rubric dimensions
6 backend/app/services/scenario_service.py clone_scenario() copies 5 weight_* fields Copy rubric_id instead
7 frontend/src/components/admin/scoring-weights.tsx ScoringWeightsProps typed to 5 keys, WEIGHT_KEYS, I18N_KEYS Deprecate component (rubric editor replaces it)
8 frontend/src/components/admin/scenario-editor.tsx 5 weight fields in zod schema, ScoringWeights usage, form values Replace with rubric selector
9 frontend/src/types/scenario.ts ScoringWeights interface with 5 keys, Scenario/ScenarioCreate types Remove weight fields, add rubric_id
10 frontend/public/locales/en-US/admin.json scenarios.keyMessageDelivery etc. (5 entries) Keep for backward compat, mark deprecated
11 frontend/public/locales/en-US/scoring.json dimensions.keyMessage etc. (5 entries) Keep for backward compat, add dynamic fallback
12 backend/scripts/seed_phase2.py Scenario seeds with hardcoded weight values Update to create rubrics and reference rubric_id
13 backend/app/startup_seed.py Potentially seeds default rubric Verify/update

Components Already Dynamic (No Changes Needed)

Component Why It's Already Dynamic
RadarChart (scoring) Reads currentScores: ScorePoint[] -- dimension comes from data
DimensionBars Reads details: ScoreDetail[] -- iterates whatever is in the array
FeedbackCard Reads single ScoreDetail -- displays detail.dimension as string
ScoreSummary Only shows overall score and pass/fail -- no dimension awareness
ReportSection Reads improvements array -- dimension comes from data
PerformanceRadar (analytics) Reads currentScores: DimensionPoint[] -- dynamic
SkillGapHeatmap Builds columns from data -- already fully dynamic
RubricEditor Already supports dynamic dimensions with useFieldArray
RubricTable Shows dimension count badge -- no hardcoded names
ScoreDetail model (backend) dimension: String(50) -- already stores arbitrary names
SessionScore model (backend) No dimension awareness -- stores overall score only

Don't Hand-Roll

Problem Don't Build Use Instead Why
Weight sum validation Custom validator Existing field_validator in RubricCreate schema Already validated, tested, handles edge cases [VERIFIED: scoring_rubric.py:30]
Dynamic radar charts Custom chart component Existing recharts.RadarChart with data-driven config Already renders N-dimensional data from array input [VERIFIED: radar-chart.tsx]
Proportional weight redistribution Manual slider math The rubric editor already handles this via individual sliders No need to port adjustWeights logic [VERIFIED: rubric-editor.tsx]
JSON dimension parsing Manual JSON.parse Existing parse_dimensions_json validator in RubricResponse Handles both string and list inputs [VERIFIED: scoring_rubric.py:78]

Key insight: The rubric system already has ~80% of what's needed. The refactoring is mainly about removing the parallel hardcoded system and wiring the existing rubric system as the only path.

Common Pitfalls

Pitfall 1: SQLite Batch Migration for Column Removal

What goes wrong: SQLite does not support ALTER TABLE DROP COLUMN natively. Attempting to drop the 5 weight columns will fail. Why it happens: Alembic generates standard ALTER TABLE SQL that SQLite cannot execute. How to avoid: Use render_as_batch=True in Alembic's env.py (already configured per CLAUDE.md Gotcha #1). The migration must use with op.batch_alter_table('scenarios') as batch_op: to recreate the table. Warning signs: Migration fails with "near DROP: syntax error" on SQLite.

Pitfall 2: Breaking Existing ScoreDetail Records

What goes wrong: Existing scored sessions have ScoreDetail rows with dimension values like key_message, objection_handling, etc. If the frontend tries to display these using new rubric-based labels, they may show raw snake_case keys. Why it happens: Dimension names in ScoreDetail are stored as strings, not FK references. They persist the name used at scoring time. How to avoid: The frontend already displays detail.dimension as a raw string. Add a dimension display name mapping utility that checks rubric first, then falls back to i18n translation, then to the raw key. Historical data remains readable. Warning signs: Old session reports show key_message instead of "Key Message Delivery".

Pitfall 3: Analytics Recommendation Query Breaks

What goes wrong: get_recommended_scenarios() in analytics_service.py maps dimension names to Scenario.weight_* columns to find scenarios targeting the user's weakest dimension. After column removal, this query breaks. Why it happens: The weight_map dict directly references ORM column attributes that no longer exist. How to avoid: Rewrite the recommendation algorithm to: (1) find user's weakest dimension from ScoreDetail records, (2) for each active scenario, load its rubric, (3) rank scenarios by the weight of the weakest dimension in their rubric. This is slightly more complex but correct. Warning signs: 500 errors on user dashboard after migration.

Pitfall 4: Null rubric_id on Existing Scenarios

What goes wrong: After adding rubric_id FK and removing weight columns, existing scenarios have rubric_id=NULL and no weight data. Why it happens: The data migration must create rubric records from existing weight values before dropping the columns. How to avoid: Three-step migration within a single Alembic file: (1) Add rubric_id column as nullable, (2) create rubric records from existing weight values using op.execute raw SQL with uuid4() and update scenarios to point to them, (3) alter rubric_id to NOT NULL, then drop weight columns. Warning signs: All existing scenarios lose their scoring configuration.

Pitfall 5: Test File Explosions

What goes wrong: There are 42+ backend test files and 44+ frontend test files referencing the 5 hardcoded dimensions. Updating all at once creates massive, error-prone diffs. Why it happens: Tests hardcode scenario weights and dimension names in fixtures. How to avoid: Create a shared test fixture/factory that generates rubric-based scenarios. Update tests to use the factory. Tests that only assert on "some dimensions exist" (not specific names) may need minimal changes. Warning signs: Hundreds of test failures after model change.

Pitfall 6: Prompt Template Regression

What goes wrong: The LLM scoring prompt template has dimension-specific instructions ("For key_message, consider which key messages were delivered..."). After making it dynamic, the LLM may produce lower-quality scores because it lacks domain-specific guidance. Why it happens: Generic instructions produce generic scores. The current per-dimension instructions encode domain expertise. How to avoid: Move the dimension-specific instructions INTO the rubric's criteria field. The prompt builder reads criteria from the rubric and includes them in the prompt. The default rubric should contain the existing detailed instructions as criteria entries. Warning signs: Score quality drops after refactoring.

Data Migration Strategy

Step 1: Add rubric_id Column (nullable initially)

# In Alembic migration, first batch_alter_table call
with op.batch_alter_table("scenarios") as batch_op:
    batch_op.add_column(
        sa.Column("rubric_id", sa.String(36),
                  sa.ForeignKey("scoring_rubrics.id"), nullable=True)
    )

Step 2: Create Rubrics from Existing Weight Combinations and Link Scenarios

# In the same Alembic migration, use op.execute / connection.execute
# For each unique weight combination in the scenarios table:
# 1. Create a ScoringRubric record with those weights as dimensions JSON
# 2. For scenarios with that weight combination, SET rubric_id to the new rubric's id
# The default 30/25/20/15/10 combination gets is_default=True

Step 3: Enforce NOT NULL and Drop Weight Columns

# After all scenarios have rubric_id populated:
with op.batch_alter_table("scenarios") as batch_op:
    batch_op.alter_column("rubric_id", nullable=False)
    batch_op.drop_column("weight_key_message")
    batch_op.drop_column("weight_objection_handling")
    batch_op.drop_column("weight_communication")
    batch_op.drop_column("weight_product_knowledge")
    batch_op.drop_column("weight_scientific_info")

Handling the Default 30/25/20/15/10 Split

Most scenarios likely use the default weights (30/25/20/15/10). The migration should:

  1. Create ONE default rubric with these weights and is_default=True
  2. Point all default-weight scenarios to this rubric
  3. Create separate rubrics only for scenarios with custom weights

Separate Scoring Systems (DO NOT MERGE)

System Dimensions Used By Location
Session Scoring (this refactor) configurable via rubric (default 5) F2F + Conference scoring scoring_service.py, scoring_engine.py
Skill Quality Scoring 6 fixed (sop_completeness, knowledge_accuracy, etc.) Skill Evaluator agent skill-evaluator/references/evaluation-dimensions.md
DryRun Scoring 2 fixed (executability_score, coverage_percent) Dry Run results dry_run_service.py

These three systems are architecturally separate and must remain so. The Skill Quality dimensions evaluate content quality (is the training material good?), while Session dimensions evaluate MR performance (did the MR perform well?). DryRun dimensions evaluate SOP executability. They serve fundamentally different purposes.

Code Examples

Alembic Migration: Add rubric_id with data migration, then remove weight columns

# Source: [CLAUDE.md Gotcha #1 pattern, adapted for this use case]
import uuid
import json

def upgrade() -> None:
    # Step 1: Add rubric_id column as nullable
    with op.batch_alter_table("scenarios") as batch_op:
        batch_op.add_column(
            sa.Column("rubric_id", sa.String(36), 
                      sa.ForeignKey("scoring_rubrics.id"), nullable=True)
        )
    
    # Step 2: Data migration -- create rubrics for each unique weight combo
    conn = op.get_bind()
    
    # Find unique weight combinations
    scenarios = conn.execute(sa.text(
        "SELECT id, weight_key_message, weight_objection_handling, "
        "weight_communication, weight_product_knowledge, weight_scientific_info "
        "FROM scenarios"
    )).fetchall()
    
    # Group by weight combo, create one rubric per unique combo
    weight_combos = {}
    for row in scenarios:
        combo_key = (row[1], row[2], row[3], row[4], row[5])
        if combo_key not in weight_combos:
            weight_combos[combo_key] = []
        weight_combos[combo_key].append(row[0])
    
    for (wkm, woh, wc, wpk, wsi), scenario_ids in weight_combos.items():
        rubric_id = str(uuid.uuid4())
        is_default = (wkm == 30 and woh == 25 and wc == 20 and wpk == 15 and wsi == 10)
        dims = json.dumps([
            {"name": "key_message", "weight": wkm, "criteria": [...], "max_score": 100.0},
            {"name": "objection_handling", "weight": woh, "criteria": [...], "max_score": 100.0},
            {"name": "communication", "weight": wc, "criteria": [...], "max_score": 100.0},
            {"name": "product_knowledge", "weight": wpk, "criteria": [...], "max_score": 100.0},
            {"name": "scientific_info", "weight": wsi, "criteria": [...], "max_score": 100.0},
        ])
        conn.execute(sa.text(
            "INSERT INTO scoring_rubrics (id, name, description, scenario_type, dimensions, is_default, created_by) "
            "VALUES (:id, :name, :desc, :stype, :dims, :is_default, :created_by)"
        ), {"id": rubric_id, "name": f"Migrated {'Default' if is_default else 'Custom'} Rubric",
            "desc": "Auto-created from scenario weight columns during migration",
            "stype": "f2f", "dims": dims, "is_default": is_default, "created_by": "system"})
        
        for sid in scenario_ids:
            conn.execute(sa.text(
                "UPDATE scenarios SET rubric_id = :rid WHERE id = :sid"
            ), {"rid": rubric_id, "sid": sid})
    
    # Step 3: Enforce NOT NULL and drop weight columns
    with op.batch_alter_table("scenarios") as batch_op:
        batch_op.alter_column("rubric_id", nullable=False)
        batch_op.drop_column("weight_key_message")
        batch_op.drop_column("weight_objection_handling")
        batch_op.drop_column("weight_communication")
        batch_op.drop_column("weight_product_knowledge")
        batch_op.drop_column("weight_scientific_info")

Dynamic Scoring Prompt Builder

# Source: [adapted from existing scoring_engine.py build_scoring_prompt]
def build_dimensions_instructions(rubric_dimensions: list[dict]) -> str:
    """Build dimension-specific scoring instructions from rubric criteria."""
    lines = []
    for dim in rubric_dimensions:
        name = dim["name"]
        weight = dim["weight"]
        criteria = dim.get("criteria", [])
        lines.append(f"- {name} (weight={weight}%)")
        if criteria:
            for criterion in criteria:
                lines.append(f"  * {criterion}")
    return "\n".join(lines)

Frontend Rubric Selector (replacing ScoringWeights)

// Source: [adapted from existing scenario-editor.tsx pattern]
// In ScenarioEditor form, replace ScoringWeights with:
<div className="grid gap-2">
  <Label>{t("scenarios.scoringRubric")}</Label>
  <Controller
    control={form.control}
    name="rubric_id"
    render={({ field }) => (
      <Select value={field.value ?? ""} onValueChange={field.onChange}>
        <SelectTrigger>
          <SelectValue placeholder="Select scoring rubric" />
        </SelectTrigger>
        <SelectContent>
          {rubrics.map((r) => (
            <SelectItem key={r.id} value={r.id}>
              {r.name} ({r.dimensions.length} dimensions)
            </SelectItem>
          ))}
        </SelectContent>
      </Select>
    )}
  />
</div>

State of the Art

Old Approach Current Approach When Changed Impact
5 weight columns on Scenario rubric_id FK to ScoringRubric This phase Unlimited configurable dimensions
Hardcoded prompt instructions Rubric criteria field drives prompt This phase Admin controls scoring guidance
Mock generator with 5 blocks Loop over rubric dimensions This phase Mock works with any dimension count
ScoringWeights component (5 sliders) Rubric selector dropdown This phase Scenario editor simplified

Deprecated after this phase:

  • ScoringWeights component -- replaced by rubric selector in ScenarioEditor
  • Scenario.get_scoring_weights() method -- replaced by rubric resolution
  • ScoringWeightsProps TypeScript interface -- no longer used
  • WEIGHT_KEYS and I18N_KEYS constants in scoring-weights.tsx -- no longer used

Assumptions Log

# Claim Section Risk if Wrong
A1 Most existing scenarios use the default 30/25/20/15/10 weights Data Migration Strategy If many custom weight combos exist, migration creates many rubrics -- not harmful but less clean
A2 The scoring prompt quality will be maintained by putting existing per-dimension instructions into rubric criteria Common Pitfalls #6 If criteria field is too short or LLM ignores it, score quality may degrade
A3 Frontend scoring display components are truly dynamic and need no changes Components Already Dynamic If any component has hidden hardcoded dimension assumptions, it will break

Open Questions

  1. Should rubric_id be required or nullable on Scenario? (RESOLVED)

    • Decision: NOT NULL per D-05. rubric_id is NOT NULL on Scenario. The data migration creates rubric records for all existing scenarios before enforcing the constraint. No fallback chain needed for scoring -- every scenario always has a rubric. get_default_rubric() is retained only as a UI convenience for pre-selecting a rubric when creating new scenarios, not as a scoring fallback.
  2. Should scenario editor show rubric dimensions inline or just a selector? (RESOLVED)

    • Decision: Selector with read-only preview per UI-SPEC IC-01. The scenario editor shows a rubric selector dropdown. When a rubric is selected, a read-only dimension preview (name + weight bar + criteria summary) appears below. Full dimension editing goes to the rubric management page via a "Manage Rubrics" link.
  3. Should the data migration run in Alembic or as a seed script? (RESOLVED)

    • Decision: Single Alembic migration with inline data migration per D-04. The migration uses raw SQL via op.get_bind() to: (1) read existing weight combinations, (2) create ScoringRubric records, (3) update scenario.rubric_id, (4) enforce NOT NULL, (5) drop weight columns. This keeps the schema change and data migration atomic. The seed scripts (seed_phase2.py, startup_seed.py) are updated separately to create scenarios with explicit rubric_id references.

Project Constraints (from CLAUDE.md)

  • NEVER modify schema without Alembic migration -- all column changes require proper migrations
  • render_as_batch for SQLite -- column drops require batch mode (Gotcha #1)
  • async with for all DB sessions -- all new service code must use async patterns
  • Service layer = business logic, routers = HTTP only -- rubric resolution belongs in service
  • Create returns 201, Delete returns 204 -- maintain API conventions
  • No raw SQL -- use SQLAlchemy ORM or Alembic for all queries (exception: Alembic data migration uses op.execute for raw SQL within the migration itself, which is the standard Alembic pattern)
  • db.flush() per project convention -- not db.commit() (session middleware handles commit)
  • Pydantic v2 schemas with from_attributes=True -- all schema updates must use ConfigDict
  • TypeScript strict: true -- no any types in frontend changes
  • TanStack Query hooks per domain -- any new hooks follow existing pattern
  • Conventional commits -- e.g., refactor(scoring): remove hardcoded dimensions from scenario model
  • server_default in migrations -- for SQLite compatibility with existing rows

Sources

Primary (HIGH confidence)

  • [Codebase audit] -- All 13 hardcoded locations identified by grep + file read
  • [backend/app/models/scenario.py] -- Current 5 weight columns
  • [backend/app/models/scoring_rubric.py] -- Existing rubric model with JSON dimensions
  • [backend/app/services/scoring_engine.py] -- Hardcoded dim_names and prompt template
  • [backend/app/services/scoring_service.py] -- Mock generator and rubric fallback logic
  • [backend/app/services/analytics_service.py] -- weight_map recommendation query
  • [frontend/src/components/admin/scoring-weights.tsx] -- 5-key typed component
  • [frontend/src/components/admin/rubric-editor.tsx] -- Already dynamic with useFieldArray
  • [frontend/src/components/scoring/radar-chart.tsx] -- Already data-driven
  • [frontend/src/components/scoring/dimension-bars.tsx] -- Already iterates ScoreDetail[]
  • [CLAUDE.md] -- Project conventions and gotchas

Secondary (MEDIUM confidence)

  • [backend/app/services/meta_skill_templates/] -- Skill quality scoring dimensions are separate
  • [backend/scripts/seed_phase2.py] -- Seed data patterns

Metadata

Confidence breakdown:

  • Hardcoded locations: HIGH -- complete grep audit of entire codebase
  • Migration strategy: HIGH -- follows established Alembic patterns in project
  • Frontend impact: HIGH -- verified each component's data flow
  • Backward compatibility: HIGH -- ScoreDetail stores dimension as string, historical data safe
  • Prompt quality after refactor: MEDIUM -- depends on criteria field quality (A2)

Research date: 2026-04-27 Valid until: 2026-05-27 (stable internal refactoring, no external dependency risk)

UI Specification

Click to expand UI spec

Phase 21 -- UI Design Contract

Visual and interaction contract for the Scoring Criteria Refactor frontend. Generated by gsd-ui-researcher, verified by gsd-ui-checker. This phase is a refactoring -- the primary UI change is replacing the hardcoded ScoringWeights component with a Rubric selector in the Scenario Editor. Scoring display components (RadarChart, DimensionBars, FeedbackCard) are already dynamic and require no changes.


Design System

Property Value
Tool Custom Radix UI wrapper (shadcn-style)
Preset Figma Make Design System for SaaS (medical brand override)
Component library Radix UI primitives (Dialog, Select, Tabs, ScrollArea, etc.)
Icon library lucide-react ^0.460.0
Font Inter + Noto Sans SC (sans), JetBrains Mono (mono)

Source: Existing project design system (Phase 01 + Phase 10 polish), confirmed via frontend/src/styles/index.css.


Spacing Scale

Declared values (must be multiples of 4):

Token Value Usage
xs 4px Icon gaps, badge internal padding
sm 8px Compact element spacing, dimension preview list gap
md 16px Default form field gaps, rubric selector margin
lg 24px Card body padding, section spacing in scenario editor
xl 32px Layout gaps between major sections
2xl 48px Page-level spacing
3xl 64px Full-page empty state centering

Exceptions: Dimension preview list items use 28px row height (7 x 4px) for dense dimension display without excessive vertical space.


Typography

Role Size Weight Line Height
Body 14px (text-sm) 400 1.5
Label 14px (text-sm) 500 1.5
Heading 20px (text-xl) 500 1.2
Display 28px (text-2xl) 600 1.2

Source: Established project tokens from frontend/src/styles/index.css @layer base rules. No new typography scales introduced in this phase.


Color

Role Value Usage
Dominant (60%) var(--background) #FFFFFF Page backgrounds, dialog backgrounds
Secondary (30%) var(--card) #FFFFFF / var(--muted) #F9FAFB Cards, rubric preview area, dimension list bg
Accent (10%) var(--primary) #1E40AF Rubric selector focus ring, "Manage Rubrics" link text, selected rubric highlight
Destructive var(--destructive) #EF4444 Delete rubric confirmation only

Accent reserved for:

  • Rubric selector focus ring and selected state border
  • "Manage Rubrics" navigation link text
  • Dimension weight progress bar fill in rubric preview
  • Primary CTA button ("Save Scenario")

Scoring semantic colors (unchanged, already in design system):

  • var(--strength) #22C55E -- strengths in scoring feedback
  • var(--weakness) #F97316 -- weaknesses in scoring feedback
  • var(--improvement) #A855F7 -- improvement suggestions in scoring feedback
  • var(--chart-1..5) -- RadarChart dimension colors (already dynamic, no changes)

Component Inventory

Changed Components

Component File Change Description
ScenarioEditor frontend/src/components/admin/scenario-editor.tsx Remove ScoringWeights import + 5 weight form fields. Add rubric_id Select field with Rubric selector dropdown. Add read-only dimension preview below selector.
ScoringWeights frontend/src/components/admin/scoring-weights.tsx DEPRECATED -- file retained for test reference but no longer imported by ScenarioEditor.

New UI Elements (within existing components)

Element Parent Description
Rubric Selector ScenarioEditor <Select> dropdown listing available rubrics by name with dimension count badge. Uses existing Radix Select pattern.
Rubric Dimension Preview ScenarioEditor Read-only list below the selector showing each dimension name, weight %, and criteria summary. Rendered when a rubric is selected.
"Manage Rubrics" Link ScenarioEditor Text link below rubric selector navigating to /admin/scoring-rubrics. Uses text-primary color.

Unchanged Components (already dynamic, verified)

Component File Why No Change
RadarChart frontend/src/components/scoring/radar-chart.tsx Reads ScorePoint[] -- dimension comes from data
DimensionBars frontend/src/components/scoring/dimension-bars.tsx Iterates ScoreDetail[] -- any dimension count works
FeedbackCard frontend/src/components/scoring/feedback-card.tsx Displays single ScoreDetail.dimension as string
ScoreSummary frontend/src/components/scoring/score-summary.tsx Only shows overall score, no dimension awareness
ReportSection frontend/src/components/scoring/report-section.tsx Reads improvements array from data
RubricEditor frontend/src/components/admin/rubric-editor.tsx Already supports dynamic dimensions with useFieldArray
RubricTable frontend/src/components/admin/rubric-table.tsx Shows dimension count badge, no hardcoded names
ScoringRubricsPage frontend/src/pages/admin/scoring-rubrics.tsx Full Rubric CRUD already functional

Interaction Contracts

IC-01: Rubric Selector in Scenario Editor

Trigger: Admin opens Scenario Editor dialog (create or edit).

Layout: The rubric selector replaces the ScoringWeights card. Position in form: after Key Messages, before Pass Threshold.

Behavior:

  1. Selector loads available rubrics via useRubrics() TanStack Query hook (already exists).
  2. Each <SelectItem> shows: rubric name + dimension count badge (e.g., "Default F2F Rubric (5 dimensions)").
  3. Default rubrics (where is_default === true) are listed first with a "(Default)" suffix.
  4. When editing an existing scenario, the selector pre-fills with scenario.rubric_id.
  5. When creating a new scenario, the selector defaults to the default rubric for the selected mode (f2f or conference).
  6. Changing the selected rubric immediately updates the dimension preview below.

Dimension Preview (read-only):

  • Rendered inside a <Card> with bg-muted/50 background, padding 16px.
  • Each dimension row: dimension name (left, text-sm font-medium), weight percentage (right, text-sm text-muted-foreground), and a thin progress bar (h-1.5 bg-primary rounded-full) showing weight relative to 100.
  • Criteria text shown as text-xs text-muted-foreground truncated to 1 line with ellipsis per dimension.
  • If no rubric selected, show placeholder text: "Select a scoring rubric to see dimensions".

"Manage Rubrics" link:

  • Below the dimension preview card.
  • Text: "Manage Rubrics" (en-US) / "管理评分标准" (zh-CN).
  • Style: text-sm text-primary hover:underline cursor-pointer.
  • Behavior: opens /admin/scoring-rubrics in the same tab (uses navigate()).

IC-02: Scenario Type Data Flow

Trigger: Admin selects a scenario mode (f2f or conference).

Behavior: When mode changes, if rubric_id is currently the default rubric for the previous mode, auto-switch to the default rubric for the new mode. If rubric_id was manually selected (non-default), keep it unchanged.

IC-03: Historical Score Dimension Display

Trigger: User views scoring feedback for any session (old or new).

Behavior:

  • New sessions: dimension names come from the rubric associated with the scenario at scoring time. Names display as stored in ScoreDetail.dimension.
  • Old sessions (pre-refactor): dimension names are stored as snake_case keys like key_message. Display using a fallback chain:
    1. Check i18n scoring:dimensions.{key} -- if translated, use translation.
    2. Otherwise, convert snake_case to Title Case (e.g., key_message becomes "Key Message").
  • This fallback utility function is shared across RadarChart, DimensionBars, and FeedbackCard.

IC-04: Form Validation Changes

Trigger: Admin submits Scenario Editor form.

Validation:

  • rubric_id is required (zod: z.string().min(1, "Scoring rubric is required")).
  • The 5 weight_* fields are removed from the zod schema entirely.
  • pass_threshold remains unchanged (0-100 number).

Copywriting Contract

Element en-US zh-CN
Rubric selector label Scoring Rubric * 评分标准 *
Rubric selector placeholder Select scoring rubric 选择评分标准
Default rubric suffix (Default) (默认)
Dimension count badge {N} dimensions {N} 个维度
Dimension preview empty Select a scoring rubric to see dimensions 选择评分标准以查看评分维度
Manage rubrics link Manage Rubrics 管理评分标准
Rubric required error Scoring rubric is required 评分标准不能为空
Deprecated weight removal notice (admin toast) Scoring weights moved to rubric configuration 评分权重已移至评分标准配置

Existing Copy Retained (backward compatibility)

Element Key Status
Key Message Delivery scoring:dimensions.keyMessage KEEP -- used for historical score display
Objection Handling scoring:dimensions.objectionHandling KEEP -- used for historical score display
Communication Skills scoring:dimensions.communicationSkills KEEP -- used for historical score display
Product Knowledge scoring:dimensions.productKnowledge KEEP -- used for historical score display
Scientific Information scoring:dimensions.scientificInfo KEEP -- used for historical score display
Scoring weights admin labels admin:scenarios.keyMessageDelivery etc. KEEP -- mark as deprecated in comments

Type Changes Contract

Removed from frontend/src/types/scenario.ts

// REMOVE: ScoringWeights interface
// REMOVE from Scenario: weight_key_message, weight_objection_handling, 
//   weight_communication, weight_product_knowledge, weight_scientific_info
// REMOVE from ScenarioCreate: all weight_* optional fields
// REMOVE from ScenarioUpdate: inherited weight_* fields

Added to frontend/src/types/scenario.ts

// ADD to Scenario:
rubric_id: string;
rubric?: Rubric;  // optional populated relation

// ADD to ScenarioCreate:
rubric_id: string;  // required

// ADD to ScenarioUpdate:
rubric_id?: string;

No Changes to frontend/src/types/rubric.ts

The Rubric, RubricCreate, RubricUpdate, DimensionConfig types are already correct.


i18n Namespace Changes

Namespace Action Keys
admin ADD scenarios.scoringRubric, scenarios.selectRubric, scenarios.rubricDefault, scenarios.dimensionCount, scenarios.dimensionPreviewEmpty, scenarios.manageRubrics, scenarios.rubricRequired
admin DEPRECATE (keep) scenarios.scoringWeights, scenarios.keyMessageDelivery, scenarios.objectionHandling, scenarios.communicationSkills, scenarios.productKnowledge, scenarios.scientificInfo
scoring KEEP dimensions.keyMessage, dimensions.objectionHandling, dimensions.productKnowledge, dimensions.scientificInfo -- needed for historical display fallback

Registry Safety

Registry Blocks Used Safety Gate
shadcn official (Radix wrappers) Select, Card, Label, Badge, Dialog, Button, Input, Slider (all pre-existing) not required -- already in project
No third-party registries none not applicable

No new UI components are installed from any registry for this phase. All UI elements are composed from existing project components.


Dimension Display Name Utility

A shared utility function is required for consistent dimension name display across all scoring components.

File: frontend/src/lib/dimension-display.ts

Contract:

/**
 * Resolve a display-friendly name for a scoring dimension.
 * 
 * Priority chain:
 * 1. i18n translation key `scoring:dimensions.{camelCase(key)}`
 * 2. Title Case conversion of the raw key (snake_case -> Title Case)
 * 
 * This ensures backward compatibility: old sessions with dimension
 * names like "key_message" display as "Key Message Delivery" via i18n,
 * while new sessions with rubric-defined names like "Clinical Data Accuracy"
 * display as-is (no i18n key exists, Title Case of the original is the name).
 */
export function getDimensionDisplayName(dimension: string, t: TFunction): string;

Checker Sign-Off

  • Dimension 1 Copywriting: PASS
  • Dimension 2 Visuals: PASS
  • Dimension 3 Color: PASS
  • Dimension 4 Typography: PASS
  • Dimension 5 Spacing: PASS
  • Dimension 6 Registry Safety: PASS

Approval: pending

⚠️ **GitHub.com Fallback** ⚠️