Planning Phase 21 - huqianghui/AI-Coach-vibe-coding GitHub Wiki
Auto-generated from
.planning/phases/21-scoring-criteria-refactor
Last synced: 2026-04-28
Gathered: 2026-04-27 Status: Ready for planning
## Phase Boundary重构 MR 会话评分(Session Scoring)的硬编码 5 维度系统,使 ScoringRubric 成为评分的唯一权威来源(Single Source of Truth)。管理员可自由定义评分维度名称/数量/权重/评分标准,所有评分流程(LLM 评分、Mock 评分、前端展示)统一从 Rubric 动态读取。
不在范围内: Dry Run Scoring(SOP 覆盖度评分)和 Skill Quality Evaluation(Skill 内容质量评分)保持不变——它们评估的是完全不同的东西。
## Implementation Decisions- 只重构 MR 会话评分的 5 个硬编码维度 → Rubric 动态
- Dry Run Scoring(
dry_run_engine.py)和 Skill Quality Evaluation(skill_evaluation_service.py)不受影响 - Skill 的
## Assessment Rubric自由文本注入 LLM prompt 的机制保留不变
- 管理员可以在 ScoringRubric 中基于默认维度添加自定义维度(如"临床数据引用准确性")
- 不同 Skill/Scenario 可以关联不同的 Rubric,实现不同 Skill 有不同评分标准
- Skill markdown 中的
## Assessment Rubric继续作为 LLM 的额外评分指导文本
- 管理员可以自由添加/删除/编辑 Rubric 中的任何维度,包括默认的 5 个
- 无锁定维度,最大灵活性
- Rubric 的
dimensionsJSON 格式保持现有:[{name, weight, criteria[], max_score}] - 权重总和必须等于 100(现有 schema 校验已支持)
- Alembic 迁移将现有 Scenario 的 5 个
weight_*列数据转换为 Rubric 记录 - 为每个现有 Scenario 自动创建对应的 Rubric(保留原有权重配置)
- 添加
rubric_idFK 到 Scenario 模型 - 迁移后删除旧的
weight_key_message、weight_objection_handling、weight_communication、weight_product_knowledge、weight_scientific_info列 - 删除
get_scoring_weights()方法
- 每个 Scenario 必须关联一个 Rubric(
rubric_idNOT NULL) - 迁移时自动为现有 Scenario 创建 Rubric 并关联
- 新建 Scenario 时必须选择或创建 Rubric
- 不再需要
get_default_rubric()回退机制(但可保留为新建 Scenario 时的默认推荐)
- 删除
scoring_engine.py中的dim_names硬编码字典 - 删除 Instructions 中对 5 个具体维度的描述(key_message、objection_handling 等)
- 维度名称、权重、评分指南全部从 Rubric 的
dimensionsJSON 动态生成 - Rubric 的
criteria[]字段直接注入为每个维度的评分指导
- 删除
WEIGHT_KEYS硬编码数组和I18N_KEYS映射 - ScoringWeights 调用 Rubric API 获取维度列表,动态生成滑块
- Scenario Editor 中选择 Rubric 后显示对应维度的权重配置
- 现有 Rubric 管理页
/admin/scoring-rubrics继续作为独立的 Rubric CRUD 入口
-
_generate_mock_scores()不再硬编码 5 个维度块 - 从 Rubric 的 dimensions 动态生成任意数量的维度评分
- 每个维度的 mock 分数、strengths、weaknesses、suggestions 基于通用模板动态生成
- Alembic migration 的具体实现细节(batch mode for SQLite etc.)
- Mock 评分生成器中通用 strengths/weaknesses 文案模板设计
- Rubric 选择 UI 在 Scenario Editor 中的具体交互设计(下拉框 vs 弹窗)
- 前端 i18n 处理(自定义维度名称是否需要 i18n)
- 测试结构和 mock 数据模式
<canonical_refs>
Downstream agents MUST read these before planning or implementing.
-
backend/app/services/scoring_engine.py— LLM 评分 prompt 构建,dim_names 硬编码位置 -
backend/app/services/scoring_service.py— 评分编排,mock 生成器,rubric 权重解析逻辑 -
backend/app/services/rubric_service.py— Rubric CRUD,get_default_rubric()
-
backend/app/models/scenario.py— Scenario 模型,5个 weight_* 列,get_scoring_weights() -
backend/app/models/scoring_rubric.py— ScoringRubric 模型,JSON dimensions -
backend/app/schemas/scoring_rubric.py— Rubric Pydantic schemas,权重校验 -
backend/app/schemas/scenario.py— Scenario schemas(需要移除 weight 字段,添加 rubric_id)
-
frontend/src/components/admin/scoring-weights.tsx— 硬编码 WEIGHT_KEYS,需重构为动态 -
frontend/src/components/scoring/radar-chart.tsx— 已动态(接收 ScorePoint[]) -
frontend/src/components/scoring/dimension-bars.tsx— 已动态(接收 ScoreDetail[]) -
frontend/src/components/scoring/feedback-card.tsx— 评分反馈卡片 -
frontend/src/pages/admin/scoring-rubrics.tsx— Rubric 管理页
-
backend/app/services/scoring_service.py:235—_extract_skill_criteria()函数(保持不变)
-
backend/alembic/versions/16f9f0ba6e9d_add_scoring_rubrics_table.py— 现有 Rubric 迁移
</canonical_refs>
<code_context>
-
ScoringRubricmodel: 已存在,dimensionsJSON 格式[{name, weight, criteria[], max_score}]已支持动态维度 -
rubric_service.py: CRUD 已完整(create/get/list/update/delete) -
RadarChart+DimensionBars: 已接收动态数据(ScorePoint[]/ScoreDetail[]),无需大改 -
/admin/scoring-rubrics页面: Rubric 管理 UI 已存在 - Rubric schema 校验:
DimensionConfig已校验权重总和 = 100
- Service layer: 业务逻辑在
services/*.py,router 只做 HTTP - Pydantic v2:
ConfigDict(from_attributes=True),field validators - Alembic: batch operations for SQLite (Gotcha #1)
- TanStack Query hooks per domain
-
scoring_service.py:69-77— 当前 rubric vs scenario weights 解析逻辑,需改为强制 rubric -
scoring_engine.py:103-113—dim_names字典和 dimensions_config 生成,需改为动态 -
scoring_service.py:298-440— Mock 评分的 5 个硬编码维度块 -
scenario.py:52-60—get_scoring_weights()和 5 个 weight_* 列 - Scenario Editor (
frontend/src/components/admin/scenario-editor.tsx) — 需要添加 Rubric 选择器
</code_context>
## Specific Ideas- 不同 Skill 可以关联不同的 Rubric,实现"同一场景用不同 Skill 时评分标准不同"的需求
- HCP 对评分的影响仅通过 LLM prompt 上下文,无需数值修改器
- 4 种评分场景完全独立:Session Scoring(本次重构)、Dry Run(SOP 覆盖度)、Skill Criteria Injection(文本注入)、Skill Quality Eval(内容质量)
None — discussion stayed within phase scope
Phase: 21-scoring-criteria-refactor Context gathered: 2026-04-27
| # | Plan File | Status |
|---|---|---|
| 21-01 | 21-01-PLAN.md | Complete |
| 21-02 | 21-02-PLAN.md | Complete |
| 21-03 | 21-03-PLAN.md | Complete |
Click to expand research notes
Researched: 2026-04-27 Domain: Scoring system refactoring -- eliminate hardcoded dimensions, make ScoringRubric the SSOT Confidence: HIGH
This phase is a refactoring of an existing, working scoring system. The core problem is that 5 scoring dimensions (key_message, objection_handling, communication, product_knowledge, scientific_info) are hardcoded in 7 locations across the codebase: the Scenario ORM model (5 weight columns), the scoring engine prompt template (dimension-specific instructions), the mock score generator (5 hardcoded dimension blocks), the frontend ScoringWeights component (typed to exactly 5 keys), the frontend Scenario TypeScript types, the analytics recommendation service (dimension-to-column mapping), and the i18n locale files (hardcoded dimension translations).
A ScoringRubric model already exists with a JSON dimensions field that supports arbitrary dimension names, weights, and criteria. The rubric editor UI already supports dynamic dimensions. The refactoring goal is to make this rubric the single source of truth so all scoring flows read dimensions from the rubric rather than from hardcoded scenario columns. This is a structural cleanup, not a feature addition -- the user-facing behavior (multi-dimensional scoring with configurable weights) remains the same, but becomes truly configurable.
Primary recommendation: Add a rubric_id FK to the Scenario model, remove the 5 weight_* columns via Alembic migration with data migration (converting existing weight values to rubric records), then refactor all downstream consumers (scoring engine, mock generator, frontend components) to read dimensions from the rubric. The two separate scoring systems (session scoring and Skill quality scoring) must remain independent -- they have different dimensions and different purposes.
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| SQLAlchemy 2.0 | existing | ORM with async sessions | Project standard [VERIFIED: codebase] |
| Alembic | existing | Schema migrations with batch mode for SQLite | Project standard [VERIFIED: codebase] |
| Pydantic v2 | existing | Request/response schemas with validators | Project standard [VERIFIED: codebase] |
| FastAPI | existing | API layer with dependency injection | Project standard [VERIFIED: codebase] |
| react-hook-form + zod | existing | Frontend form validation | Project standard [VERIFIED: codebase] |
| recharts | existing | RadarChart, charts for scoring visualization | Project standard [VERIFIED: codebase] |
This refactoring uses exclusively existing libraries. No new packages need to be installed. [VERIFIED: codebase audit]
Scenario model
├── weight_key_message: int = 30
├── weight_objection_handling: int = 25
├── weight_communication: int = 20
├── weight_product_knowledge: int = 15
├── weight_scientific_info: int = 10
└── get_scoring_weights() -> dict # Returns hardcoded 5-key dict
ScoringRubric model (exists but underused)
└── dimensions: JSON # [{name, weight, criteria[], max_score}]
scoring_service.py
├── Reads scenario.get_scoring_weights() as fallback
└── Reads rubric dimensions only if a default rubric exists
scoring_engine.py
├── SCORING_PROMPT_TEMPLATE: hardcoded dimension instructions
└── dim_names dict: maps 5 keys to display names
Scenario model
├── rubric_id: FK -> scoring_rubrics.id (NOT NULL per D-05)
├── pass_threshold: int = 70
└── (weight_* columns REMOVED)
ScoringRubric model (SSOT)
└── dimensions: JSON # [{name, weight, criteria[], max_score}]
scoring_service.py
├── Resolves rubric: always via scenario.rubric_id (NOT NULL, no fallback needed)
└── Passes rubric dimensions to scoring engine
scoring_engine.py
├── Builds prompt dynamically from rubric dimensions
└── No hardcoded dimension names or instructions
Frontend
├── ScenarioEditor: rubric selector (required field) instead of ScoringWeights
└── All scoring components: read dimensions from score.details (already dynamic)
What: Direct rubric lookup via scenario.rubric_id (NOT NULL per D-05) When to use: Every time a session needs to be scored Example:
# Source: [codebase pattern from rubric_service.py, simplified per D-05]
async def resolve_rubric_dimensions(db: AsyncSession, scenario) -> list[dict]:
"""Resolve rubric dimensions for scoring.
Per D-05: rubric_id is NOT NULL, so direct lookup always succeeds.
get_default_rubric() fallback is no longer needed for scoring.
"""
import json as _json
from app.services.rubric_service import get_rubric
rubric = await get_rubric(db, scenario.rubric_id)
dims = rubric.dimensions
return _json.loads(dims) if isinstance(dims, str) else dimsWhat: Build scoring prompt from rubric dimensions, not hardcoded names When to use: LLM scoring engine Example:
# Replace hardcoded dim_names dict with rubric-driven config
def build_dimensions_config(rubric_dimensions: list[dict]) -> str:
lines = []
for dim in rubric_dimensions:
name = dim["name"]
weight = dim["weight"]
criteria = dim.get("criteria", [])
criteria_text = "; ".join(criteria) if criteria else "General assessment"
lines.append(f"- {name}: weight={weight}%, criteria: {criteria_text}")
return "\n".join(lines)What: Generate mock scores for arbitrary dimension sets When to use: Mock scoring fallback when LLM unavailable Example:
def _generate_mock_scores(
rubric_dimensions: list[dict],
scenario: Scenario,
messages: list,
key_messages_status: list[dict],
) -> dict:
"""Generate mock scores for N arbitrary dimensions."""
dimensions = []
for dim_config in rubric_dimensions:
score = min(95, max(60, base_score + random.randint(-8, 10)))
dimensions.append({
"dimension": dim_config["name"],
"score": score,
"weight": dim_config["weight"],
"strengths": [...],
"weaknesses": [...],
"suggestions": [...],
})
# Calculate weighted overall
overall = sum(d["score"] * d["weight"] / 100 for d in dimensions)
...- Merging session scoring with Skill quality scoring: These are two separate systems with different dimensions (5 MR-facing vs 6 content-quality). They must remain independent. [VERIFIED: codebase -- Skill scoring uses sop_completeness, knowledge_accuracy, etc.]
- Merging with DryRun scoring: DryRun uses executability_score and coverage_percent, which are completely different metrics. Do not touch DryRun scoring. [VERIFIED: codebase]
-
Breaking backward compatibility on stored data: Existing ScoreDetail rows reference dimension names like
key_message. These must remain readable even after the refactoring. New sessions will use rubric-defined names. - Removing ScoringWeights component entirely: Deprecate but keep the file until all references are migrated. The rubric editor already handles dynamic dimensions.
| # | File | What's Hardcoded | Action |
|---|---|---|---|
| 1 | backend/app/models/scenario.py |
5 weight_* columns + get_scoring_weights() method |
Remove columns, add rubric_id FK |
| 2 | backend/app/schemas/scenario.py |
5 weight_* fields in Create/Update/Response + validate_weights_sum
|
Remove weight fields, add rubric_id
|
| 3 | backend/app/services/scoring_engine.py |
dim_names dict mapping 5 keys to labels, per-dimension instructions in SCORING_PROMPT_TEMPLATE
|
Build dynamically from rubric |
| 4 | backend/app/services/scoring_service.py |
_generate_mock_scores() with 5 hardcoded dimension blocks |
Rewrite as loop over rubric dimensions |
| 5 | backend/app/services/analytics_service.py |
weight_map dict in get_recommended_scenarios() mapping dimension to Scenario.weight_* columns |
Rewrite to query via rubric dimensions |
| 6 | backend/app/services/scenario_service.py |
clone_scenario() copies 5 weight_* fields |
Copy rubric_id instead |
| 7 | frontend/src/components/admin/scoring-weights.tsx |
ScoringWeightsProps typed to 5 keys, WEIGHT_KEYS, I18N_KEYS
|
Deprecate component (rubric editor replaces it) |
| 8 | frontend/src/components/admin/scenario-editor.tsx |
5 weight fields in zod schema, ScoringWeights usage, form values |
Replace with rubric selector |
| 9 | frontend/src/types/scenario.ts |
ScoringWeights interface with 5 keys, Scenario/ScenarioCreate types |
Remove weight fields, add rubric_id
|
| 10 | frontend/public/locales/en-US/admin.json |
scenarios.keyMessageDelivery etc. (5 entries) |
Keep for backward compat, mark deprecated |
| 11 | frontend/public/locales/en-US/scoring.json |
dimensions.keyMessage etc. (5 entries) |
Keep for backward compat, add dynamic fallback |
| 12 | backend/scripts/seed_phase2.py |
Scenario seeds with hardcoded weight values | Update to create rubrics and reference rubric_id |
| 13 | backend/app/startup_seed.py |
Potentially seeds default rubric | Verify/update |
| Component | Why It's Already Dynamic |
|---|---|
RadarChart (scoring) |
Reads currentScores: ScorePoint[] -- dimension comes from data |
DimensionBars |
Reads details: ScoreDetail[] -- iterates whatever is in the array |
FeedbackCard |
Reads single ScoreDetail -- displays detail.dimension as string |
ScoreSummary |
Only shows overall score and pass/fail -- no dimension awareness |
ReportSection |
Reads improvements array -- dimension comes from data |
PerformanceRadar (analytics) |
Reads currentScores: DimensionPoint[] -- dynamic |
SkillGapHeatmap |
Builds columns from data -- already fully dynamic |
RubricEditor |
Already supports dynamic dimensions with useFieldArray
|
RubricTable |
Shows dimension count badge -- no hardcoded names |
ScoreDetail model (backend) |
dimension: String(50) -- already stores arbitrary names |
SessionScore model (backend) |
No dimension awareness -- stores overall score only |
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Weight sum validation | Custom validator | Existing field_validator in RubricCreate schema |
Already validated, tested, handles edge cases [VERIFIED: scoring_rubric.py:30] |
| Dynamic radar charts | Custom chart component | Existing recharts.RadarChart with data-driven config |
Already renders N-dimensional data from array input [VERIFIED: radar-chart.tsx] |
| Proportional weight redistribution | Manual slider math | The rubric editor already handles this via individual sliders | No need to port adjustWeights logic [VERIFIED: rubric-editor.tsx] |
| JSON dimension parsing | Manual JSON.parse | Existing parse_dimensions_json validator in RubricResponse
|
Handles both string and list inputs [VERIFIED: scoring_rubric.py:78] |
Key insight: The rubric system already has ~80% of what's needed. The refactoring is mainly about removing the parallel hardcoded system and wiring the existing rubric system as the only path.
What goes wrong: SQLite does not support ALTER TABLE DROP COLUMN natively. Attempting to drop the 5 weight columns will fail.
Why it happens: Alembic generates standard ALTER TABLE SQL that SQLite cannot execute.
How to avoid: Use render_as_batch=True in Alembic's env.py (already configured per CLAUDE.md Gotcha #1). The migration must use with op.batch_alter_table('scenarios') as batch_op: to recreate the table.
Warning signs: Migration fails with "near DROP: syntax error" on SQLite.
What goes wrong: Existing scored sessions have ScoreDetail rows with dimension values like key_message, objection_handling, etc. If the frontend tries to display these using new rubric-based labels, they may show raw snake_case keys.
Why it happens: Dimension names in ScoreDetail are stored as strings, not FK references. They persist the name used at scoring time.
How to avoid: The frontend already displays detail.dimension as a raw string. Add a dimension display name mapping utility that checks rubric first, then falls back to i18n translation, then to the raw key. Historical data remains readable.
Warning signs: Old session reports show key_message instead of "Key Message Delivery".
What goes wrong: get_recommended_scenarios() in analytics_service.py maps dimension names to Scenario.weight_* columns to find scenarios targeting the user's weakest dimension. After column removal, this query breaks.
Why it happens: The weight_map dict directly references ORM column attributes that no longer exist.
How to avoid: Rewrite the recommendation algorithm to: (1) find user's weakest dimension from ScoreDetail records, (2) for each active scenario, load its rubric, (3) rank scenarios by the weight of the weakest dimension in their rubric. This is slightly more complex but correct.
Warning signs: 500 errors on user dashboard after migration.
What goes wrong: After adding rubric_id FK and removing weight columns, existing scenarios have rubric_id=NULL and no weight data.
Why it happens: The data migration must create rubric records from existing weight values before dropping the columns.
How to avoid: Three-step migration within a single Alembic file: (1) Add rubric_id column as nullable, (2) create rubric records from existing weight values using op.execute raw SQL with uuid4() and update scenarios to point to them, (3) alter rubric_id to NOT NULL, then drop weight columns.
Warning signs: All existing scenarios lose their scoring configuration.
What goes wrong: There are 42+ backend test files and 44+ frontend test files referencing the 5 hardcoded dimensions. Updating all at once creates massive, error-prone diffs. Why it happens: Tests hardcode scenario weights and dimension names in fixtures. How to avoid: Create a shared test fixture/factory that generates rubric-based scenarios. Update tests to use the factory. Tests that only assert on "some dimensions exist" (not specific names) may need minimal changes. Warning signs: Hundreds of test failures after model change.
What goes wrong: The LLM scoring prompt template has dimension-specific instructions ("For key_message, consider which key messages were delivered..."). After making it dynamic, the LLM may produce lower-quality scores because it lacks domain-specific guidance.
Why it happens: Generic instructions produce generic scores. The current per-dimension instructions encode domain expertise.
How to avoid: Move the dimension-specific instructions INTO the rubric's criteria field. The prompt builder reads criteria from the rubric and includes them in the prompt. The default rubric should contain the existing detailed instructions as criteria entries.
Warning signs: Score quality drops after refactoring.
# In Alembic migration, first batch_alter_table call
with op.batch_alter_table("scenarios") as batch_op:
batch_op.add_column(
sa.Column("rubric_id", sa.String(36),
sa.ForeignKey("scoring_rubrics.id"), nullable=True)
)# In the same Alembic migration, use op.execute / connection.execute
# For each unique weight combination in the scenarios table:
# 1. Create a ScoringRubric record with those weights as dimensions JSON
# 2. For scenarios with that weight combination, SET rubric_id to the new rubric's id
# The default 30/25/20/15/10 combination gets is_default=True# After all scenarios have rubric_id populated:
with op.batch_alter_table("scenarios") as batch_op:
batch_op.alter_column("rubric_id", nullable=False)
batch_op.drop_column("weight_key_message")
batch_op.drop_column("weight_objection_handling")
batch_op.drop_column("weight_communication")
batch_op.drop_column("weight_product_knowledge")
batch_op.drop_column("weight_scientific_info")Most scenarios likely use the default weights (30/25/20/15/10). The migration should:
- Create ONE default rubric with these weights and
is_default=True - Point all default-weight scenarios to this rubric
- Create separate rubrics only for scenarios with custom weights
| System | Dimensions | Used By | Location |
|---|---|---|---|
| Session Scoring (this refactor) | configurable via rubric (default 5) | F2F + Conference scoring | scoring_service.py, scoring_engine.py |
| Skill Quality Scoring | 6 fixed (sop_completeness, knowledge_accuracy, etc.) | Skill Evaluator agent | skill-evaluator/references/evaluation-dimensions.md |
| DryRun Scoring | 2 fixed (executability_score, coverage_percent) | Dry Run results | dry_run_service.py |
These three systems are architecturally separate and must remain so. The Skill Quality dimensions evaluate content quality (is the training material good?), while Session dimensions evaluate MR performance (did the MR perform well?). DryRun dimensions evaluate SOP executability. They serve fundamentally different purposes.
# Source: [CLAUDE.md Gotcha #1 pattern, adapted for this use case]
import uuid
import json
def upgrade() -> None:
# Step 1: Add rubric_id column as nullable
with op.batch_alter_table("scenarios") as batch_op:
batch_op.add_column(
sa.Column("rubric_id", sa.String(36),
sa.ForeignKey("scoring_rubrics.id"), nullable=True)
)
# Step 2: Data migration -- create rubrics for each unique weight combo
conn = op.get_bind()
# Find unique weight combinations
scenarios = conn.execute(sa.text(
"SELECT id, weight_key_message, weight_objection_handling, "
"weight_communication, weight_product_knowledge, weight_scientific_info "
"FROM scenarios"
)).fetchall()
# Group by weight combo, create one rubric per unique combo
weight_combos = {}
for row in scenarios:
combo_key = (row[1], row[2], row[3], row[4], row[5])
if combo_key not in weight_combos:
weight_combos[combo_key] = []
weight_combos[combo_key].append(row[0])
for (wkm, woh, wc, wpk, wsi), scenario_ids in weight_combos.items():
rubric_id = str(uuid.uuid4())
is_default = (wkm == 30 and woh == 25 and wc == 20 and wpk == 15 and wsi == 10)
dims = json.dumps([
{"name": "key_message", "weight": wkm, "criteria": [...], "max_score": 100.0},
{"name": "objection_handling", "weight": woh, "criteria": [...], "max_score": 100.0},
{"name": "communication", "weight": wc, "criteria": [...], "max_score": 100.0},
{"name": "product_knowledge", "weight": wpk, "criteria": [...], "max_score": 100.0},
{"name": "scientific_info", "weight": wsi, "criteria": [...], "max_score": 100.0},
])
conn.execute(sa.text(
"INSERT INTO scoring_rubrics (id, name, description, scenario_type, dimensions, is_default, created_by) "
"VALUES (:id, :name, :desc, :stype, :dims, :is_default, :created_by)"
), {"id": rubric_id, "name": f"Migrated {'Default' if is_default else 'Custom'} Rubric",
"desc": "Auto-created from scenario weight columns during migration",
"stype": "f2f", "dims": dims, "is_default": is_default, "created_by": "system"})
for sid in scenario_ids:
conn.execute(sa.text(
"UPDATE scenarios SET rubric_id = :rid WHERE id = :sid"
), {"rid": rubric_id, "sid": sid})
# Step 3: Enforce NOT NULL and drop weight columns
with op.batch_alter_table("scenarios") as batch_op:
batch_op.alter_column("rubric_id", nullable=False)
batch_op.drop_column("weight_key_message")
batch_op.drop_column("weight_objection_handling")
batch_op.drop_column("weight_communication")
batch_op.drop_column("weight_product_knowledge")
batch_op.drop_column("weight_scientific_info")# Source: [adapted from existing scoring_engine.py build_scoring_prompt]
def build_dimensions_instructions(rubric_dimensions: list[dict]) -> str:
"""Build dimension-specific scoring instructions from rubric criteria."""
lines = []
for dim in rubric_dimensions:
name = dim["name"]
weight = dim["weight"]
criteria = dim.get("criteria", [])
lines.append(f"- {name} (weight={weight}%)")
if criteria:
for criterion in criteria:
lines.append(f" * {criterion}")
return "\n".join(lines)// Source: [adapted from existing scenario-editor.tsx pattern]
// In ScenarioEditor form, replace ScoringWeights with:
<div className="grid gap-2">
<Label>{t("scenarios.scoringRubric")}</Label>
<Controller
control={form.control}
name="rubric_id"
render={({ field }) => (
<Select value={field.value ?? ""} onValueChange={field.onChange}>
<SelectTrigger>
<SelectValue placeholder="Select scoring rubric" />
</SelectTrigger>
<SelectContent>
{rubrics.map((r) => (
<SelectItem key={r.id} value={r.id}>
{r.name} ({r.dimensions.length} dimensions)
</SelectItem>
))}
</SelectContent>
</Select>
)}
/>
</div>| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| 5 weight columns on Scenario | rubric_id FK to ScoringRubric | This phase | Unlimited configurable dimensions |
| Hardcoded prompt instructions | Rubric criteria field drives prompt | This phase | Admin controls scoring guidance |
| Mock generator with 5 blocks | Loop over rubric dimensions | This phase | Mock works with any dimension count |
| ScoringWeights component (5 sliders) | Rubric selector dropdown | This phase | Scenario editor simplified |
Deprecated after this phase:
-
ScoringWeightscomponent -- replaced by rubric selector in ScenarioEditor -
Scenario.get_scoring_weights()method -- replaced by rubric resolution -
ScoringWeightsPropsTypeScript interface -- no longer used -
WEIGHT_KEYSandI18N_KEYSconstants in scoring-weights.tsx -- no longer used
| # | Claim | Section | Risk if Wrong |
|---|---|---|---|
| A1 | Most existing scenarios use the default 30/25/20/15/10 weights | Data Migration Strategy | If many custom weight combos exist, migration creates many rubrics -- not harmful but less clean |
| A2 | The scoring prompt quality will be maintained by putting existing per-dimension instructions into rubric criteria | Common Pitfalls #6 | If criteria field is too short or LLM ignores it, score quality may degrade |
| A3 | Frontend scoring display components are truly dynamic and need no changes | Components Already Dynamic | If any component has hidden hardcoded dimension assumptions, it will break |
-
Should rubric_id be required or nullable on Scenario? (RESOLVED)
-
Decision: NOT NULL per D-05. rubric_id is NOT NULL on Scenario. The data migration creates rubric records for all existing scenarios before enforcing the constraint. No fallback chain needed for scoring -- every scenario always has a rubric.
get_default_rubric()is retained only as a UI convenience for pre-selecting a rubric when creating new scenarios, not as a scoring fallback.
-
Decision: NOT NULL per D-05. rubric_id is NOT NULL on Scenario. The data migration creates rubric records for all existing scenarios before enforcing the constraint. No fallback chain needed for scoring -- every scenario always has a rubric.
-
Should scenario editor show rubric dimensions inline or just a selector? (RESOLVED)
- Decision: Selector with read-only preview per UI-SPEC IC-01. The scenario editor shows a rubric selector dropdown. When a rubric is selected, a read-only dimension preview (name + weight bar + criteria summary) appears below. Full dimension editing goes to the rubric management page via a "Manage Rubrics" link.
-
Should the data migration run in Alembic or as a seed script? (RESOLVED)
-
Decision: Single Alembic migration with inline data migration per D-04. The migration uses raw SQL via
op.get_bind()to: (1) read existing weight combinations, (2) create ScoringRubric records, (3) update scenario.rubric_id, (4) enforce NOT NULL, (5) drop weight columns. This keeps the schema change and data migration atomic. The seed scripts (seed_phase2.py,startup_seed.py) are updated separately to create scenarios with explicitrubric_idreferences.
-
Decision: Single Alembic migration with inline data migration per D-04. The migration uses raw SQL via
- NEVER modify schema without Alembic migration -- all column changes require proper migrations
- render_as_batch for SQLite -- column drops require batch mode (Gotcha #1)
- async with for all DB sessions -- all new service code must use async patterns
- Service layer = business logic, routers = HTTP only -- rubric resolution belongs in service
- Create returns 201, Delete returns 204 -- maintain API conventions
- No raw SQL -- use SQLAlchemy ORM or Alembic for all queries (exception: Alembic data migration uses op.execute for raw SQL within the migration itself, which is the standard Alembic pattern)
- db.flush() per project convention -- not db.commit() (session middleware handles commit)
- Pydantic v2 schemas with from_attributes=True -- all schema updates must use ConfigDict
-
TypeScript strict: true -- no
anytypes in frontend changes - TanStack Query hooks per domain -- any new hooks follow existing pattern
-
Conventional commits -- e.g.,
refactor(scoring): remove hardcoded dimensions from scenario model - server_default in migrations -- for SQLite compatibility with existing rows
- [Codebase audit] -- All 13 hardcoded locations identified by grep + file read
- [backend/app/models/scenario.py] -- Current 5 weight columns
- [backend/app/models/scoring_rubric.py] -- Existing rubric model with JSON dimensions
- [backend/app/services/scoring_engine.py] -- Hardcoded dim_names and prompt template
- [backend/app/services/scoring_service.py] -- Mock generator and rubric fallback logic
- [backend/app/services/analytics_service.py] -- weight_map recommendation query
- [frontend/src/components/admin/scoring-weights.tsx] -- 5-key typed component
- [frontend/src/components/admin/rubric-editor.tsx] -- Already dynamic with useFieldArray
- [frontend/src/components/scoring/radar-chart.tsx] -- Already data-driven
- [frontend/src/components/scoring/dimension-bars.tsx] -- Already iterates ScoreDetail[]
- [CLAUDE.md] -- Project conventions and gotchas
- [backend/app/services/meta_skill_templates/] -- Skill quality scoring dimensions are separate
- [backend/scripts/seed_phase2.py] -- Seed data patterns
Confidence breakdown:
- Hardcoded locations: HIGH -- complete grep audit of entire codebase
- Migration strategy: HIGH -- follows established Alembic patterns in project
- Frontend impact: HIGH -- verified each component's data flow
- Backward compatibility: HIGH -- ScoreDetail stores dimension as string, historical data safe
- Prompt quality after refactor: MEDIUM -- depends on criteria field quality (A2)
Research date: 2026-04-27 Valid until: 2026-05-27 (stable internal refactoring, no external dependency risk)
Click to expand UI spec
Visual and interaction contract for the Scoring Criteria Refactor frontend. Generated by gsd-ui-researcher, verified by gsd-ui-checker. This phase is a refactoring -- the primary UI change is replacing the hardcoded ScoringWeights component with a Rubric selector in the Scenario Editor. Scoring display components (RadarChart, DimensionBars, FeedbackCard) are already dynamic and require no changes.
| Property | Value |
|---|---|
| Tool | Custom Radix UI wrapper (shadcn-style) |
| Preset | Figma Make Design System for SaaS (medical brand override) |
| Component library | Radix UI primitives (Dialog, Select, Tabs, ScrollArea, etc.) |
| Icon library | lucide-react ^0.460.0 |
| Font | Inter + Noto Sans SC (sans), JetBrains Mono (mono) |
Source: Existing project design system (Phase 01 + Phase 10 polish), confirmed via frontend/src/styles/index.css.
Declared values (must be multiples of 4):
| Token | Value | Usage |
|---|---|---|
| xs | 4px | Icon gaps, badge internal padding |
| sm | 8px | Compact element spacing, dimension preview list gap |
| md | 16px | Default form field gaps, rubric selector margin |
| lg | 24px | Card body padding, section spacing in scenario editor |
| xl | 32px | Layout gaps between major sections |
| 2xl | 48px | Page-level spacing |
| 3xl | 64px | Full-page empty state centering |
Exceptions: Dimension preview list items use 28px row height (7 x 4px) for dense dimension display without excessive vertical space.
| Role | Size | Weight | Line Height |
|---|---|---|---|
| Body | 14px (text-sm) | 400 | 1.5 |
| Label | 14px (text-sm) | 500 | 1.5 |
| Heading | 20px (text-xl) | 500 | 1.2 |
| Display | 28px (text-2xl) | 600 | 1.2 |
Source: Established project tokens from frontend/src/styles/index.css @layer base rules. No new typography scales introduced in this phase.
| Role | Value | Usage |
|---|---|---|
| Dominant (60%) | var(--background) #FFFFFF | Page backgrounds, dialog backgrounds |
| Secondary (30%) | var(--card) #FFFFFF / var(--muted) #F9FAFB | Cards, rubric preview area, dimension list bg |
| Accent (10%) | var(--primary) #1E40AF | Rubric selector focus ring, "Manage Rubrics" link text, selected rubric highlight |
| Destructive | var(--destructive) #EF4444 | Delete rubric confirmation only |
Accent reserved for:
- Rubric selector focus ring and selected state border
- "Manage Rubrics" navigation link text
- Dimension weight progress bar fill in rubric preview
- Primary CTA button ("Save Scenario")
Scoring semantic colors (unchanged, already in design system):
- var(--strength) #22C55E -- strengths in scoring feedback
- var(--weakness) #F97316 -- weaknesses in scoring feedback
- var(--improvement) #A855F7 -- improvement suggestions in scoring feedback
- var(--chart-1..5) -- RadarChart dimension colors (already dynamic, no changes)
| Component | File | Change Description |
|---|---|---|
| ScenarioEditor | frontend/src/components/admin/scenario-editor.tsx |
Remove ScoringWeights import + 5 weight form fields. Add rubric_id Select field with Rubric selector dropdown. Add read-only dimension preview below selector. |
| ScoringWeights | frontend/src/components/admin/scoring-weights.tsx |
DEPRECATED -- file retained for test reference but no longer imported by ScenarioEditor. |
| Element | Parent | Description |
|---|---|---|
| Rubric Selector | ScenarioEditor |
<Select> dropdown listing available rubrics by name with dimension count badge. Uses existing Radix Select pattern. |
| Rubric Dimension Preview | ScenarioEditor | Read-only list below the selector showing each dimension name, weight %, and criteria summary. Rendered when a rubric is selected. |
| "Manage Rubrics" Link | ScenarioEditor | Text link below rubric selector navigating to /admin/scoring-rubrics. Uses text-primary color. |
| Component | File | Why No Change |
|---|---|---|
| RadarChart | frontend/src/components/scoring/radar-chart.tsx |
Reads ScorePoint[] -- dimension comes from data |
| DimensionBars | frontend/src/components/scoring/dimension-bars.tsx |
Iterates ScoreDetail[] -- any dimension count works |
| FeedbackCard | frontend/src/components/scoring/feedback-card.tsx |
Displays single ScoreDetail.dimension as string |
| ScoreSummary | frontend/src/components/scoring/score-summary.tsx |
Only shows overall score, no dimension awareness |
| ReportSection | frontend/src/components/scoring/report-section.tsx |
Reads improvements array from data |
| RubricEditor | frontend/src/components/admin/rubric-editor.tsx |
Already supports dynamic dimensions with useFieldArray |
| RubricTable | frontend/src/components/admin/rubric-table.tsx |
Shows dimension count badge, no hardcoded names |
| ScoringRubricsPage | frontend/src/pages/admin/scoring-rubrics.tsx |
Full Rubric CRUD already functional |
Trigger: Admin opens Scenario Editor dialog (create or edit).
Layout: The rubric selector replaces the ScoringWeights card. Position in form: after Key Messages, before Pass Threshold.
Behavior:
- Selector loads available rubrics via
useRubrics()TanStack Query hook (already exists). - Each
<SelectItem>shows: rubric name + dimension count badge (e.g., "Default F2F Rubric (5 dimensions)"). - Default rubrics (where
is_default === true) are listed first with a "(Default)" suffix. - When editing an existing scenario, the selector pre-fills with
scenario.rubric_id. - When creating a new scenario, the selector defaults to the default rubric for the selected mode (f2f or conference).
- Changing the selected rubric immediately updates the dimension preview below.
Dimension Preview (read-only):
- Rendered inside a
<Card>withbg-muted/50background, padding16px. - Each dimension row: dimension name (left,
text-sm font-medium), weight percentage (right,text-sm text-muted-foreground), and a thin progress bar (h-1.5 bg-primary rounded-full) showing weight relative to 100. - Criteria text shown as
text-xs text-muted-foregroundtruncated to 1 line with ellipsis per dimension. - If no rubric selected, show placeholder text: "Select a scoring rubric to see dimensions".
"Manage Rubrics" link:
- Below the dimension preview card.
- Text: "Manage Rubrics" (en-US) / "管理评分标准" (zh-CN).
- Style:
text-sm text-primary hover:underline cursor-pointer. - Behavior: opens
/admin/scoring-rubricsin the same tab (usesnavigate()).
Trigger: Admin selects a scenario mode (f2f or conference).
Behavior: When mode changes, if rubric_id is currently the default rubric for the previous mode, auto-switch to the default rubric for the new mode. If rubric_id was manually selected (non-default), keep it unchanged.
Trigger: User views scoring feedback for any session (old or new).
Behavior:
- New sessions: dimension names come from the rubric associated with the scenario at scoring time. Names display as stored in
ScoreDetail.dimension. - Old sessions (pre-refactor): dimension names are stored as snake_case keys like
key_message. Display using a fallback chain:- Check i18n
scoring:dimensions.{key}-- if translated, use translation. - Otherwise, convert snake_case to Title Case (e.g.,
key_messagebecomes "Key Message").
- Check i18n
- This fallback utility function is shared across RadarChart, DimensionBars, and FeedbackCard.
Trigger: Admin submits Scenario Editor form.
Validation:
-
rubric_idis required (zod:z.string().min(1, "Scoring rubric is required")). - The 5
weight_*fields are removed from the zod schema entirely. -
pass_thresholdremains unchanged (0-100 number).
| Element | en-US | zh-CN |
|---|---|---|
| Rubric selector label | Scoring Rubric * | 评分标准 * |
| Rubric selector placeholder | Select scoring rubric | 选择评分标准 |
| Default rubric suffix | (Default) | (默认) |
| Dimension count badge | {N} dimensions | {N} 个维度 |
| Dimension preview empty | Select a scoring rubric to see dimensions | 选择评分标准以查看评分维度 |
| Manage rubrics link | Manage Rubrics | 管理评分标准 |
| Rubric required error | Scoring rubric is required | 评分标准不能为空 |
| Deprecated weight removal notice (admin toast) | Scoring weights moved to rubric configuration | 评分权重已移至评分标准配置 |
| Element | Key | Status |
|---|---|---|
| Key Message Delivery | scoring:dimensions.keyMessage |
KEEP -- used for historical score display |
| Objection Handling | scoring:dimensions.objectionHandling |
KEEP -- used for historical score display |
| Communication Skills | scoring:dimensions.communicationSkills |
KEEP -- used for historical score display |
| Product Knowledge | scoring:dimensions.productKnowledge |
KEEP -- used for historical score display |
| Scientific Information | scoring:dimensions.scientificInfo |
KEEP -- used for historical score display |
| Scoring weights admin labels |
admin:scenarios.keyMessageDelivery etc. |
KEEP -- mark as deprecated in comments |
// REMOVE: ScoringWeights interface
// REMOVE from Scenario: weight_key_message, weight_objection_handling,
// weight_communication, weight_product_knowledge, weight_scientific_info
// REMOVE from ScenarioCreate: all weight_* optional fields
// REMOVE from ScenarioUpdate: inherited weight_* fields// ADD to Scenario:
rubric_id: string;
rubric?: Rubric; // optional populated relation
// ADD to ScenarioCreate:
rubric_id: string; // required
// ADD to ScenarioUpdate:
rubric_id?: string;The Rubric, RubricCreate, RubricUpdate, DimensionConfig types are already correct.
| Namespace | Action | Keys |
|---|---|---|
| admin | ADD |
scenarios.scoringRubric, scenarios.selectRubric, scenarios.rubricDefault, scenarios.dimensionCount, scenarios.dimensionPreviewEmpty, scenarios.manageRubrics, scenarios.rubricRequired
|
| admin | DEPRECATE (keep) |
scenarios.scoringWeights, scenarios.keyMessageDelivery, scenarios.objectionHandling, scenarios.communicationSkills, scenarios.productKnowledge, scenarios.scientificInfo
|
| scoring | KEEP |
dimensions.keyMessage, dimensions.objectionHandling, dimensions.productKnowledge, dimensions.scientificInfo -- needed for historical display fallback |
| Registry | Blocks Used | Safety Gate |
|---|---|---|
| shadcn official (Radix wrappers) | Select, Card, Label, Badge, Dialog, Button, Input, Slider (all pre-existing) | not required -- already in project |
| No third-party registries | none | not applicable |
No new UI components are installed from any registry for this phase. All UI elements are composed from existing project components.
A shared utility function is required for consistent dimension name display across all scoring components.
File: frontend/src/lib/dimension-display.ts
Contract:
/**
* Resolve a display-friendly name for a scoring dimension.
*
* Priority chain:
* 1. i18n translation key `scoring:dimensions.{camelCase(key)}`
* 2. Title Case conversion of the raw key (snake_case -> Title Case)
*
* This ensures backward compatibility: old sessions with dimension
* names like "key_message" display as "Key Message Delivery" via i18n,
* while new sessions with rubric-defined names like "Clinical Data Accuracy"
* display as-is (no i18n key exists, Title Case of the original is the name).
*/
export function getDimensionDisplayName(dimension: string, t: TFunction): string;- Dimension 1 Copywriting: PASS
- Dimension 2 Visuals: PASS
- Dimension 3 Color: PASS
- Dimension 4 Typography: PASS
- Dimension 5 Spacing: PASS
- Dimension 6 Registry Safety: PASS
Approval: pending