Planning Phase 12 - huqianghui/AI-Coach-vibe-coding GitHub Wiki
Auto-generated from
.planning/phases/12-voice-realtime-api-agent
Last synced: 2026-04-28
Gathered: 2026-04-02 Status: Ready for planning
## Phase BoundaryExtend HCP profiles to be complete "digital persona" configurations — each HCP stores Voice Live API settings (voice name, conversation parameters) and Avatar settings (character, custom avatar) alongside the existing AI Foundry Agent. When an MR selects an HCP and starts a session, the system auto-configures the voice connection with per-HCP settings and defaults to Digital Human Realtime Agent mode with automatic fallback to voice-only or text.
## Implementation Decisions- D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
- D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
-
D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with
customized: truetoggle) — matching reference repo pattern - D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP
- D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
- D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
- D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column
- D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
- D-09: MR cannot override HCP voice/avatar settings during a session — settings are locked per-HCP for consistent experience
- D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker — system auto-selects based on HCP config and service availability
- D-11: Fallback chain: Digital Human Realtime Agent → Voice-only Realtime → Text mode. Triggered when avatar service unavailable or network degraded
- D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session
- Exact DB column types and migration details for new HCP voice/avatar fields
- Default avatar/voice options list (can derive from Azure documentation)
- Tab component implementation details (reuse existing Tabs from UI library)
- WebSocket reconnection strategy on network recovery
- Status indicator component design
<canonical_refs>
Downstream agents MUST read these before planning or implementing.
- User's screenshot of Voice Live Agent demo — shows full settings panel (Instructions, Connection Settings, Conversation Settings, Voice, Avatar) with Digital Human avatar rendering and chat
-
backend/app/models/hcp_profile.py— HcpProfile ORM model (extend with voice/avatar fields) -
backend/app/schemas/hcp_profile.py— HcpProfileCreate/Update/Response schemas (extend) -
backend/app/api/hcp_profiles.py— HCP profile CRUD router -
backend/app/services/hcp_profile_service.py— HCP profile service layer -
backend/app/services/agent_sync_service.py— Agent sync (extend to sync voice/avatar config)
-
backend/app/services/voice_live_service.py— Token broker (extend to return per-HCP voice/avatar settings) -
backend/app/schemas/voice_live.py— VoiceLiveTokenResponse (extend with voice/avatar fields) -
backend/app/services/agents/adapters/azure_voice_live.py— Agent/Model mode parse/encode -
backend/app/api/voice_live.py— Voice Live API routes
-
frontend/src/hooks/use-voice-live.ts— RTClient WebSocket hook (consume per-HCP settings) -
frontend/src/hooks/use-avatar-stream.ts— Avatar WebRTC hook (consume per-HCP avatar config) -
frontend/src/components/voice/voice-session.tsx— VoiceSession container -
frontend/src/components/voice/mode-selector.tsx— Current mode selector (replace with auto-mode + fallback) -
frontend/src/components/voice/avatar-view.tsx— Avatar renderer
-
frontend/src/pages/admin/hcp-profiles.tsx— HCP profiles admin page (add tabs) -
frontend/src/pages/admin/hcp-profile-editor.tsx— HCP editor (extend with tabs) -
frontend/src/components/admin/hcp-table.tsx— HCP table (add Voice+Avatar column) -
frontend/src/types/hcp.ts— HCP TypeScript types (extend)
-
backend/app/services/config_service.py— AI Foundry unified config -
backend/app/services/connection_tester.py— Connection testing patterns
</canonical_refs>
<code_context>
-
HcpProfilemodel already hasagent_id,agent_sync_statusfields from Phase 11 — extend with voice/avatar columns -
agent_sync_service.py— Pattern for auto-syncing on HCP CRUD, reuse for voice/avatar validation -
VoiceLiveTokenResponse— Already returns endpoint, api_key, agent_id — extend with voice/avatar settings -
Tabscomponent in UI library — reuse for HCP editor tabbed layout -
useVoiceLivehook — Already handles WebSocket connection, needs to accept per-HCP conversation params -
useAvatarStreamhook — Already handles WebRTC, needs to accept per-HCP avatar character -
mode-selector.tsx— Has the 7-mode mapping, will be replaced by auto-mode logic
- Per-domain TanStack Query hooks with mutation invalidation
- Alembic migration with
server_defaultfor SQLite compatibility - i18n namespaces per domain (admin, voice)
- Token broker pattern: backend generates config, frontend consumes directly
- Full-screen session pages without UserLayout
- HcpProfile model → add ~12 new columns for voice/avatar settings
- Token broker → extend response to include all voice/avatar params from HCP
- VoiceSession container → consume per-HCP settings instead of global config
- Mode selector → replace with auto-mode + fallback chain logic
- HCP editor page → add tabbed layout with Voice & Avatar tab
- HCP table → add Voice+Avatar column
</code_context>
## Specific Ideas- Reference implementation screenshot shows the exact settings panel: Instructions, Connection Settings, Conversation Settings (Recognition Language, Noise suppression, Echo cancellation, Turn detection, EOU detection, Temperature), Voice (custom voice toggle, voice name), Avatar (toggle, custom avatar toggle, character)
- Each HCP becomes a complete "digital persona" — personality + voice + appearance
- Smart defaults mean new HCPs work immediately for demo without manual configuration
- Fallback chain matches the user's note: "voice+avatar as default, fallback to voice or text if service unavailable or network bad"
- Token broker is the single integration point — frontend gets everything it needs in one call
- Developer mode toggle for MRs to override HCP settings during debug sessions — future enhancement
- Per-session provider override — always use HCP-level config for now
- Azure AD token auth (DefaultAzureCredential) for Entra token acquisition — future phase
- Multiple avatar characters per HCP (wardrobe selection) — future enhancement
- Voice cloning / custom neural voice training — future phase
Phase: 12-voice-realtime-api-agent Context gathered: 2026-04-02
| # | Plan File | Status |
|---|---|---|
| 12-01 | 12-01-PLAN.md | Complete |
| 12-02 | 12-02-PLAN.md | Complete |
| 12-03 | 12-03-PLAN.md | Complete |
| 12-04 | 12-04-PLAN.md | Complete |
Click to expand research notes
Researched: 2026-04-02 Domain: HCP digital persona configuration, Voice Live API session wiring, auto-mode + fallback chain Confidence: HIGH
Phase 12 extends HCP profiles into complete "digital persona" configurations that bundle voice, avatar, and conversation parameters alongside the existing AI Foundry Agent. The token broker API becomes the single integration point: it reads all per-HCP settings and returns them to the frontend, which auto-configures WebSocket and Avatar connections without manual mode selection. The fallback chain (Digital Human Realtime Agent -> Voice-only Realtime -> Text) replaces the current 7-mode ModeSelector with automatic degradation.
The codebase is well-structured for this extension. The HcpProfile ORM model needs ~12 new columns for voice/avatar settings. The VoiceLiveTokenResponse schema already returns voice_name, avatar_character, and agent_id -- these just need to be sourced from HCP profile data instead of global config. The frontend VoiceSession container already implements a basic fallback chain (avatar failure -> voice-only -> text); it needs refinement to consume per-HCP settings from the token broker and display a persistent mode status indicator.
Primary recommendation: Work bottom-up: database migration first, then backend schema/service extension, then frontend HCP editor tabs, then session wiring with auto-mode + fallback, then integration testing.
<user_constraints>
- D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
- D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
-
D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with
customized: truetoggle) -- matching reference repo pattern - D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP
- D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
- D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
- D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column
- D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
- D-09: MR cannot override HCP voice/avatar settings during a session -- settings are locked per-HCP for consistent experience
- D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker -- system auto-selects based on HCP config and service availability
- D-11: Fallback chain: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Triggered when avatar service unavailable or network degraded
- D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session
- Exact DB column types and migration details for new HCP voice/avatar fields
- Default avatar/voice options list (can derive from Azure documentation)
- Tab component implementation details (reuse existing Tabs from UI library)
- WebSocket reconnection strategy on network recovery
- Status indicator component design
- Developer mode toggle for MRs to override HCP settings during debug sessions
- Per-session provider override -- always use HCP-level config for now
- Azure AD token auth (DefaultAzureCredential) for Entra token acquisition
- Multiple avatar characters per HCP (wardrobe selection)
- Voice cloning / custom neural voice training </user_constraints>
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| SQLAlchemy 2.0 (async) | >=2.0.0 | ORM model extension for voice/avatar fields | Already in use, async throughout |
| Alembic | >=1.13.0 | Database migration for new columns | Required by project rules |
| Pydantic v2 | >=2.0.0 | Schema extension for voice/avatar fields | Already in use for all schemas |
| @radix-ui/react-tabs | (via project UI lib) | Tabbed HCP editor layout | Already available as Tabs component |
| react-hook-form + zod | (via project) | Form validation for voice/avatar settings tab | Already used in HCP editor |
| rt-client | 0.5.2 | Voice Live WebSocket connection | Already installed from reference repo |
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| sonner | (via project) | Toast notifications for fallback alerts | Fallback chain notifications |
| lucide-react | >=0.460.0 | Icons for mode status indicator | Status indicator component |
None -- this phase extends existing infrastructure, not introducing new libraries.
New/modified files organized by domain:
backend/
alembic/versions/
i12a_add_voice_avatar_fields_to_hcp_profile.py # NEW: migration
app/
models/hcp_profile.py # EXTEND: ~12 new columns
schemas/hcp_profile.py # EXTEND: voice/avatar fields
schemas/voice_live.py # EXTEND: per-HCP fields in response
services/voice_live_service.py # EXTEND: source settings from HCP
services/hcp_profile_service.py # EXTEND: handle voice/avatar in CRUD
api/voice_live.py # EXTEND: accept hcp_profile_id param
frontend/
src/
types/hcp.ts # EXTEND: voice/avatar fields
types/voice-live.ts # EXTEND: new token response fields
pages/admin/hcp-profile-editor.tsx # REWRITE: tabbed layout
components/admin/hcp-table.tsx # EXTEND: Voice+Avatar column
components/admin/voice-avatar-tab.tsx # NEW: Voice & Avatar settings tab
components/admin/agent-tab.tsx # NEW: Agent instructions tab
components/voice/voice-session.tsx # EXTEND: auto-mode + per-HCP config
components/voice/mode-status-indicator.tsx # NEW: persistent mode badge
components/voice/mode-selector.tsx # REMOVE: no longer needed (auto-mode)
hooks/use-voice-token.ts # EXTEND: pass hcp_profile_id
api/voice-live.ts # EXTEND: pass hcp_profile_id to token
What: Token broker reads HCP profile to source voice/avatar/conversation settings instead of global config. When to use: Every voice session start. Example:
# Source: existing voice_live_service.py pattern, extended per D-08
async def get_voice_live_token(
db: AsyncSession,
hcp_profile_id: str | None = None,
) -> VoiceLiveTokenResponse:
# ... existing config fetch ...
# Source voice/avatar from HCP profile (D-08)
if hcp_profile_id:
profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id)
voice_name = profile.voice_name or "en-US-AvaNeural"
avatar_character = profile.avatar_character or "lori"
avatar_style = profile.avatar_style or "casual"
avatar_customized = profile.avatar_customized
temperature = profile.voice_temperature or 0.9
# ... etc for all conversation params
return VoiceLiveTokenResponse(
# ... existing fields ...
voice_name=voice_name,
avatar_character=avatar_character,
avatar_style=avatar_style,
avatar_customized=avatar_customized,
temperature=temperature,
turn_detection_type=turn_detection_type,
noise_suppression=noise_suppression,
echo_cancellation=echo_cancellation,
eou_detection=eou_detection,
recognition_language=recognition_language,
)What: Frontend automatically selects the best mode based on HCP config and service availability. No ModeSelector exposed to MR. When to use: Session initialization in VoiceSession container. Example:
// Source: existing voice-session.tsx fallback pattern, refined per D-10/D-11
const resolveMode = (tokenData: VoiceLiveToken): SessionMode => {
// D-10: Default to Digital Human Realtime Agent (best experience)
if (tokenData.avatar_enabled && tokenData.agent_id) {
return "digital_human_realtime_agent";
}
if (tokenData.avatar_enabled) {
return "digital_human_realtime_model";
}
if (tokenData.agent_id) {
return "voice_realtime_agent";
}
return "voice_realtime_model";
};
// D-11: Fallback chain on connection failure
// Avatar fails -> voice-only; Voice fails -> textWhat: Replace current single-page editor with 3-tab layout using existing Radix Tabs. When to use: HCP profile create/edit page. Example:
// Source: existing Tabs component from @/components/ui/tabs
<Tabs defaultValue="profile">
<TabsList>
<TabsTrigger value="profile">Profile</TabsTrigger>
<TabsTrigger value="voice-avatar">Voice & Avatar</TabsTrigger>
<TabsTrigger value="agent">Agent</TabsTrigger>
</TabsList>
<TabsContent value="profile">
{/* Existing personality/specialty/objections fields */}
</TabsContent>
<TabsContent value="voice-avatar">
<VoiceAvatarTab form={form} />
</TabsContent>
<TabsContent value="agent">
<AgentTab profile={profile} onRetrySync={handleRetrySync} />
</TabsContent>
</Tabs>- Exposing mode picker to MR (D-09/D-10): MRs must NOT manually select voice/avatar modes. System auto-selects.
- Global voice/avatar config fallback: Always source from HCP profile. Only fall back to global defaults when HCP has no configuration.
- Mixing tab state with form state: All voice/avatar fields must be part of the single react-hook-form instance, not separate state.
-
Storing avatar settings in a separate table: Keep all HCP digital persona fields in the same
hcp_profilestable -- simpler queries, no joins needed.
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Tabbed layout | Custom tab switching logic | Radix Tabs (@/components/ui/tabs) |
Already in UI library, accessible, keyboard-navigable |
| Avatar character list | Hardcoded constants | Azure standard avatars list from docs | Authoritative source, characters updated by Microsoft |
| Form validation for new fields | Manual validation in handlers | zod schema extension in existing HCP form | Already established pattern in hcp-profile-editor.tsx |
| WebSocket session config | Manual JSON construction | Extend existing useVoiceLive hook |
Hook already builds session config from tokenData |
| Persistent mode indicator | Custom status component | Badge + cn() from existing UI primitives | Consistent with existing badge patterns in the project |
What goes wrong: Alembic op.add_column() fails on SQLite for certain operations.
Why it happens: SQLite doesn't fully support ALTER TABLE. The project already uses batch operations.
How to avoid: Use with op.batch_alter_table("hcp_profiles") as batch_op: for all column additions, with server_default on every column.
Warning signs: Migration fails locally but would work on PostgreSQL.
What goes wrong: Token broker returns global config instead of per-HCP settings because hcp_profile_id is not passed.
Why it happens: The current POST /voice-live/token endpoint doesn't accept hcp_profile_id. The voice session page gets session data which includes scenario_id, and scenario has hcp_profile_id.
How to avoid: Extend the token endpoint to accept hcp_profile_id as a query parameter or request body field. Wire it through from VoiceSessionPage -> useVoiceToken -> fetchVoiceLiveToken -> API.
Warning signs: All HCPs use the same voice/avatar during sessions.
What goes wrong: Avatar character and style are concatenated or confused (e.g., "lori-casual" vs character="lori" style="casual").
Why it happens: Azure Avatar API requires character and style as separate fields in the session config JSON. The reference screenshots show them combined in UI display.
How to avoid: Store avatar_character and avatar_style as separate DB columns. Display combined in table badges. Send separate in WebSocket session config.
Warning signs: Avatar fails to render because character name includes the style suffix.
What goes wrong: Switching tabs resets form fields if each tab has its own form state. Why it happens: Multiple form instances or conditional rendering that unmounts tab content. How to avoid: Use a single react-hook-form instance that spans all tabs. Radix Tabs renders all TabsContent in DOM by default (just hidden), so form state persists across tab switches. Warning signs: Admin fills voice settings, switches to Profile tab, switches back, and settings are gone.
What goes wrong: Circular import error when voice_live_service imports hcp_profile_service at module level.
Why it happens: Already documented as Phase 11 decision -- voice_live_service uses lazy import inside the function body.
How to avoid: Continue using the existing lazy import pattern: from app.services import hcp_profile_service inside the function, not at module level.
Warning signs: ImportError on server startup.
What goes wrong: Avatar doesn't render because session config JSON structure doesn't match Azure Voice Live API expected format.
Why it happens: The avatar config in session.update requires specific nested structure: { character, style, customized, video: { codec, crop, resolution } }.
How to avoid: Use the exact Azure API structure from the Voice Live how-to docs. The existing useVoiceLive hook already sends avatar config but without style and customized fields -- extend it.
Warning signs: WebSocket connection succeeds but avatar video stream never starts.
{
"instructions": "You are Dr. Zhang, an Oncology specialist...",
"turn_detection": {
"type": "server_vad",
"silence_duration_ms": 500
},
"input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"},
"input_audio_echo_cancellation": {"type": "server_echo_cancellation"},
"voice": {
"name": "en-US-Ava:DragonHDLatestNeural",
"type": "azure-standard",
"temperature": 0.9
},
"input_audio_transcription": {
"model": "azure-speech",
"language": "zh-CN"
},
"avatar": {
"character": "lori",
"style": "casual",
"customized": false,
"video": {
"codec": "h264",
"crop": {"top_left": [560, 0], "bottom_right": [1360, 1080]}
}
},
"agent_id": "dr-zhang-oncology",
"project_name": "ai-coach-project"
}Source: Azure Voice Live API how-to docs (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to)
// Source: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars
const AVATAR_VIDEO_CHARACTERS = [
{ character: "harry", styles: ["business", "casual", "youthful"] },
{ character: "jeff", styles: ["business", "formal"] },
{ character: "lisa", styles: ["casual-sitting", "graceful-sitting", "graceful-standing", "technical-sitting", "technical-standing"] },
{ character: "lori", styles: ["casual", "graceful", "formal"] },
{ character: "max", styles: ["business", "casual", "formal"] },
{ character: "meg", styles: ["formal", "casual", "business"] },
] as const;
// Note: Photo avatars (Adrian, Amara, Bianca, etc.) are also available but only at 512x512 resolution.
// Video avatars are recommended for this project due to 1920x1080 resolution.# Source: Derived from D-01 and Azure Voice Live session config
# All columns use server_default for SQLite compatibility with existing rows
# Voice settings
voice_name: Mapped[str] = mapped_column(String(200), default="en-US-AvaNeural")
voice_type: Mapped[str] = mapped_column(String(50), default="azure-standard")
voice_temperature: Mapped[float] = mapped_column(default=0.9)
voice_custom: Mapped[bool] = mapped_column(Boolean, default=False)
# Avatar settings
avatar_character: Mapped[str] = mapped_column(String(100), default="lori")
avatar_style: Mapped[str] = mapped_column(String(100), default="casual")
avatar_customized: Mapped[bool] = mapped_column(Boolean, default=False)
# Conversation parameters
turn_detection_type: Mapped[str] = mapped_column(String(50), default="server_vad")
noise_suppression: Mapped[bool] = mapped_column(Boolean, default=False)
echo_cancellation: Mapped[bool] = mapped_column(Boolean, default=False)
eou_detection: Mapped[bool] = mapped_column(Boolean, default=False)
recognition_language: Mapped[str] = mapped_column(String(20), default="auto")
# Agent instruction override (D-02)
agent_instructions_override: Mapped[str] = mapped_column(Text, default="")# Source: Extend existing backend/app/schemas/voice_live.py
class VoiceLiveTokenResponse(BaseModel):
# Existing fields
endpoint: str
token: str
region: str
model: str
avatar_enabled: bool
avatar_character: str
voice_name: str
agent_id: str | None = None
project_name: str | None = None
# New per-HCP fields (D-08)
avatar_style: str = "casual"
avatar_customized: bool = False
voice_type: str = "azure-standard"
voice_temperature: float = 0.9
turn_detection_type: str = "server_vad"
noise_suppression: bool = False
echo_cancellation: bool = False
eou_detection: bool = False
recognition_language: str = "auto"// Source: Azure Voice Live API how-to docs
const TURN_DETECTION_TYPES = [
{ value: "server_vad", label: "Server VAD" },
{ value: "semantic_vad", label: "Semantic VAD (gpt-realtime only)" },
{ value: "azure_semantic_vad", label: "Azure Semantic VAD (all models)" },
{ value: "azure_semantic_vad_multilingual", label: "Azure Semantic VAD Multilingual" },
] as const;// Source: Azure Speech TTS voice list (commonly used for Chinese + English)
const VOICE_NAME_OPTIONS = [
// English voices
{ value: "en-US-AvaNeural", label: "Ava (EN-US)" },
{ value: "en-US-Ava:DragonHDLatestNeural", label: "Ava HD (EN-US)" },
{ value: "en-US-AndrewNeural", label: "Andrew (EN-US)" },
{ value: "en-US-JennyNeural", label: "Jenny (EN-US)" },
// Chinese voices
{ value: "zh-CN-XiaoxiaoMultilingualNeural", label: "Xiaoxiao Multilingual (ZH-CN)" },
{ value: "zh-CN-XiaoxiaoNeural", label: "Xiaoxiao (ZH-CN)" },
{ value: "zh-CN-YunxiNeural", label: "Yunxi (ZH-CN)" },
{ value: "zh-CN-YunjianNeural", label: "Yunjian (ZH-CN)" },
] as const;| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| Global voice/avatar config | Per-HCP voice/avatar config | Phase 12 | Each HCP is a complete digital persona |
| 7-mode manual selector | Auto-mode with fallback chain | Phase 12 | MRs never see mode picker |
| server_vad only | Multiple turn detection types | Voice Live API 2025-10 | azure_semantic_vad works with all models |
| Single avatar character globally | Per-HCP avatar character + style | Phase 12 | Different HCPs look different |
| h264 only codec | h264 remains default (Video Avatar) | Current | Photo Avatar supports vp9 but lower res |
Azure Voice Live API supported models (current):
- gpt-realtime, gpt-realtime-mini, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, phi4-mm-realtime, phi4-mini
Turn detection types available:
-
server_vad(default, all models) -
semantic_vad(gpt-realtime/gpt-realtime-mini only) -
azure_semantic_vad(all models, Voice Live specific) -
azure_semantic_vad_multilingual(all models, multilingual support)
-
Avatar style naming format
- What we know: Azure API uses separate
characterandstylefields (e.g., character="lisa", style="casual-sitting"). The existing codebase storesavatar_characteras a combined string like "Lisa-casual-sitting". - What's unclear: Should we store combined (backward compatible) or split (matches API)?
- Recommendation: Store split (
avatar_character+avatar_style) to match Azure API structure. Combine for display only. The migration can defaultavatar_character="lori"andavatar_style="casual".
- What we know: Azure API uses separate
-
Recognition language "Auto Detect" value
- What we know: Azure Voice Live docs show
"language": "en"for explicit language. D-04 says default "Auto Detect". - What's unclear: The exact value for auto-detect in the Azure API (empty string? omit the field?).
- Recommendation: Use empty string
""or omitlanguagefield frominput_audio_transcriptionconfig when "auto" is selected. Store"auto"in DB, translate to API format at WebSocket config time.
- What we know: Azure Voice Live docs show
-
Whether to keep ModeSelector component
- What we know: D-10 says MR does NOT see a mode picker. But the admin/debug use case was deferred.
- What's unclear: Should mode-selector.tsx be deleted or just hidden from MR view?
- Recommendation: Keep the file but do not render it in the voice session. The auto-mode logic replaces its function. The component can be restored later if developer mode is implemented.
- Async everywhere: all backend functions must be
async def - Pydantic v2 schemas with
model_config = ConfigDict(from_attributes=True) - Route ordering: static paths before parameterized (
/{id}) - Service layer holds business logic, routers only handle HTTP
- No raw SQL -- use SQLAlchemy ORM
- TypeScript strict mode: no
any, no unused variables - TanStack Query hooks per domain, no inline useQuery
-
cn()for conditional class composition - i18n: all UI text externalized via react-i18next
- Conventional commits:
feat:,fix:,docs:,test:
- NEVER modify schema without Alembic migration
- All models use TimestampMixin
- batch_alter_table with server_default for SQLite compatibility
- Current Alembic head:
b820e86271f8
- Backend:
ruff check .,ruff format --check .,pytest -v - Frontend:
npx tsc -b,npm run build
- Azure Voice Live API how-to: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to -- session.update config structure, turn detection types, voice config, avatar config, noise suppression, echo cancellation
- Azure Standard Avatars: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars -- full character list with styles (Harry, Jeff, Lisa, Lori, Max, Meg + photo avatars)
- Azure Voice Live overview: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live -- supported models, pricing tiers, feature list
- Existing codebase files (all read directly):
-
backend/app/models/hcp_profile.py-- current ORM model -
backend/app/schemas/hcp_profile.py-- current Pydantic schemas -
backend/app/services/voice_live_service.py-- current token broker -
backend/app/schemas/voice_live.py-- current token response schema -
backend/app/services/hcp_profile_service.py-- CRUD with agent sync hooks -
backend/app/services/agent_sync_service.py-- agent instructions builder -
frontend/src/hooks/use-voice-live.ts-- WebSocket session config builder -
frontend/src/hooks/use-avatar-stream.ts-- WebRTC avatar connection -
frontend/src/components/voice/voice-session.tsx-- session container with fallback -
frontend/src/components/voice/mode-selector.tsx-- 7-mode selector (to be replaced) -
frontend/src/pages/admin/hcp-profile-editor.tsx-- current editor layout -
frontend/src/components/admin/hcp-table.tsx-- current table columns -
frontend/src/types/hcp.ts-- HCP TypeScript types -
frontend/src/types/voice-live.ts-- Voice Live types -
frontend/src/components/ui/tabs.tsx-- Radix Tabs available in UI library -
backend/app/services/region_capabilities.py-- region/service availability maps
-
- Azure OpenAI Realtime API reference (linked from Voice Live docs) -- base event format that Voice Live extends
- Voice name list is a commonly-used subset, not exhaustive. Azure has 600+ standard voices. The admin should have a text input with the dropdown as suggestions, not a locked select.
Confidence breakdown:
- Standard stack: HIGH - all libraries already in the project, no new dependencies
- Architecture: HIGH - extending well-established patterns (token broker, HCP CRUD, form hooks)
- Pitfalls: HIGH - based on direct codebase reading and established project conventions
- Azure API config structure: HIGH - verified from official Microsoft documentation (updated 2026-02-04 / 2026-03-16)
Research date: 2026-04-02 Valid until: 2026-05-02 (stable -- Azure Voice Live API is GA, avatar characters list stable)
Click to expand UI spec
Visual and interaction contract for the Voice Realtime API & Agent Mode Integration phase. Generated by gsd-ui-researcher, verified by gsd-ui-checker.
| Property | Value |
|---|---|
| Tool | none (Tailwind CSS v4 with @theme inline custom properties) |
| Preset | not applicable |
| Component library | Radix UI (via project @/components/ui/* wrappers) |
| Icon library | lucide-react >=0.460.0 |
| Font | Inter + Noto Sans SC (sans-serif), JetBrains Mono (monospace) |
Source: Existing frontend/src/styles/index.css @theme inline block, established in Phase 01. No new design system installations required.
Declared values (must be multiples of 4):
| Token | Value | Usage in Phase 12 |
|---|---|---|
| xs | 4px | Icon gaps, inline badge padding within Voice+Avatar column (gap-1), switch-to-label gap |
| sm | 8px | Compact element spacing, tab trigger padding, form field gaps within a row, dot-to-text gap in ModeStatusIndicator (gap-2) |
| md | 16px | Default element spacing, card content padding, tab content top margin, form field vertical gaps (space-y-4) |
| lg | 24px | Section padding within cards, gap between form sections inside a tab (space-y-6) |
| xl | 32px | Gap between major card sections in the editor, header-to-content gap |
| 2xl | 48px | Page-level top/bottom padding |
| 3xl | 64px | Not used in this phase |
Exceptions: Touch target minimum 44px for voice session controls (mic button, end session button) per existing Phase 08 pattern.
| Role | Size | Weight | Line Height | Phase 12 Usage |
|---|---|---|---|---|
| Badge/Indicator | 12px (text-xs) |
400 or 600 | 1.5 | Badge text in HCP table Voice+Avatar column, ModeStatusIndicator text (font-semibold), agent sync status badges |
| Body | 14px (text-sm) |
400 (normal) | 1.5 | Form field values, table cell text, Textarea content, transcript text |
| Label | 14px (text-sm) |
400 (normal) | 1.5 | FormLabel text, Switch labels, Select labels. Differentiated from body via text-muted-foreground color, not weight |
| Heading | 16px (text-base) |
600 (semibold) | 1.5 | CardTitle in each form section (Voice Settings, Avatar Settings, etc.), tab triggers |
| Display | 24px (text-2xl) |
600 (semibold) | 1.5 | Not used in this phase (no page-level display headings introduced) |
Two weights only: 400 (normal) for body text and labels, 600 (semibold) for headings.
| Role | Value | Usage in Phase 12 |
|---|---|---|
| Dominant (60%) |
var(--background) #FFFFFF
|
Page background, tab content background, form input backgrounds |
| Secondary (30%) |
var(--card) #FFFFFF / var(--muted) #ececf0
|
Cards in HCP editor, table header row bg-slate-50/50, tab list background bg-muted, disabled Textarea bg-muted/50
|
| Accent (10%) |
var(--primary) #1E40AF
|
Save Profile primary button (bg-primary), active tab trigger shadow highlight |
| Destructive |
var(--destructive) #EF4444
|
Delete HCP action, End Session button, failed agent sync status badge (bg-red-100 text-red-700), disconnected mode status dot |
Accent reserved for:
- Save Profile primary button (
bg-primary) - Active TabsTrigger state (uses
bg-backgroundwith shadow per Radix default, not direct accent fill)
Additional semantic colors used in this phase (already established):
| Token | Value | Usage |
|---|---|---|
| Green |
bg-green-500 (dot) / bg-green-100 text-green-700 (badge) |
Connected mode status dot, synced agent badge, avatar active indicator |
| Amber |
bg-amber-500 (dot) / bg-amber-100 text-amber-700 (badge) |
Degraded mode status dot, pending agent sync badge |
| Red |
bg-destructive (dot) / bg-red-100 text-red-700 (badge) |
Disconnected mode status dot, failed agent sync badge |
| Muted foreground |
var(--muted-foreground) #717182
|
"Not configured" badge text, disabled form labels, placeholder text |
Source: Existing CSS custom properties in index.css, Phase 10 theme system. ModeStatusIndicator dot colors verified from implementation: bg-green-500, bg-amber-500, bg-destructive.
| Screen | Primary Focal Point | Rationale |
|---|---|---|
| HCP Profile Editor | Save Profile button (top-right of header bar) | The single CTA that commits all tab changes; placed in persistent header outside tabs so it remains visible regardless of active tab |
| Voice Session | ModeStatusIndicator (center of VoiceSessionHeader) | Communicates the live connection state and active mode; center placement ensures MR always knows session health at a glance |
| HCP Table | Voice & Avatar column badges | New column added in this phase; draws attention to per-HCP digital persona configuration status |
| Component | Location | Description |
|---|---|---|
| VoiceAvatarTab | frontend/src/components/admin/voice-avatar-tab.tsx |
Form tab content for Voice & Avatar settings. Contains: voice name Select/Input with custom voice Switch toggle, avatar character Select with avatar style Select (linked -- style options filter by selected character using AVATAR_VIDEO_CHARACTERS constant), custom avatar Switch toggle, conversation parameters (temperature Slider 0.0-1.0 step 0.1, turn detection Select with 4 options, boolean Switches for noise suppression / echo cancellation / EOU detection, recognition language Select). Uses UseFormReturn<HcpFormValues> from parent form instance. |
| AgentTab | frontend/src/components/admin/agent-tab.tsx |
Form tab content for Agent instructions and sync status. Contains: agent status Card (icon + status label + agent_id with Tooltip + Retry Sync Button + View in Azure Portal link), auto-generated instructions preview via buildPreviewInstructions() (disabled Textarea), editable override Textarea (agent_instructions_override form field). Uses AGENT_STATUS_CONFIG constant for status icon/color/bg mapping. |
| ModeStatusIndicator | frontend/src/components/voice/mode-status-indicator.tsx |
Persistent session mode badge replacing the center Badge in VoiceSessionHeader. Shows current active mode label (from voice:modeBadge.* i18n keys) with colored dot: green (bg-green-500) when at optimal mode (currentMode === initialMode), amber (bg-amber-500) when degraded (currentMode !== initialMode), red (bg-destructive) when disconnected/error. Uses Badge variant="outline" with role="status" aria-live="polite". Dot is size-2 shrink-0 rounded-full. Gap between dot and text: gap-2 (8px). |
| Component | Changes |
|---|---|
hcp-profile-editor.tsx |
Replaced single-page form layout with 3-tab Tabs layout. Form wraps Tabs (not individual TabsContent) for cross-tab state persistence via single useForm<HcpFormValues> instance. Profile tab wraps existing Identity/Personality/Knowledge/Interaction Cards. Voice & Avatar tab renders VoiceAvatarTab. Agent tab renders AgentTab. Zod schema extended with 13 voice/avatar fields. Header with Save button remains outside tabs. |
hcp-table.tsx |
Added "Voice & Avatar" column after "Agent Status" column. Renders voice name (via getVoiceLabel() helper) and avatar character-style as inline Badge pair (variant="outline", text-xs), or text-muted-foreground "Not configured" text when defaults. Column is non-sortable. |
voice-session.tsx |
Removed mode prop from external interface. Added hcpProfileId prop. Auto-resolves mode from token broker response via resolveMode(tokenData) function (D-10). Implements fallback chain (D-11): avatar fail -> voice-only -> text with toast.warning() notifications. Passes hcpProfileId to useVoiceToken hook. Passes currentMode, initialMode, connectionState to ModeStatusIndicator. |
voice-session-header.tsx |
Replaced center static Badge with ModeStatusIndicator component. Passes currentMode, initialMode, and connectionState as props. |
mode-selector.tsx |
File retained but component no longer rendered in voice session pages (D-10). Not deleted to allow future developer-mode restoration. |
| Component | Usage in Phase 12 |
|---|---|
| Tabs / TabsList / TabsTrigger / TabsContent | HCP editor 3-tab layout |
| Select / SelectTrigger / SelectContent / SelectItem | Voice name, avatar character, avatar style, turn detection, recognition language dropdowns |
| Switch | Custom voice toggle, custom avatar toggle, noise suppression, echo cancellation, EOU detection |
| Slider | Temperature (0.0 - 1.0, step 0.1) |
| Input | Custom voice name text input (shown when custom voice toggle is on) |
| Badge | Voice+Avatar column in HCP table, mode status in session header |
| Card / CardHeader / CardTitle / CardContent | Form section containers within each tab |
| Tooltip / TooltipTrigger / TooltipContent | Agent ID display, agent status error details |
| Dialog | End session confirmation (existing) |
| toast (sonner) | Fallback notifications (D-12), save success/error, sync success/error |
| Form / FormField / FormItem / FormLabel / FormControl / FormMessage | All form fields in all three tabs |
| Textarea | Auto-generated instructions (disabled), override instructions (editable) |
| Button | Save Profile, Retry Sync, View in Azure Portal, back navigation |
| Label | Switch companion labels using htmlFor binding |
Trigger: Admin clicks a tab trigger (Profile / Voice & Avatar / Agent).
Behavior: Radix Tabs switches visible content instantly. All three TabsContent panels remain mounted in DOM (Radix default behavior). Form state from react-hook-form persists across tab switches because a single useForm<HcpFormValues> instance wraps all tabs at the <Form> level above <Tabs>.
Visual: Active tab trigger shows bg-background with shadow (default TabsTrigger style from data-[state=active]). Inactive triggers show text-muted-foreground.
Constraint: Tab switching must NOT trigger form validation. Validation only runs on Save button click via form.handleSubmit().
Trigger: Admin interacts with voice name field in Voice & Avatar tab.
Behavior: When "Custom voice" Switch is OFF (default), show a Select dropdown with preset voice options from VOICE_NAME_OPTIONS constant (8 options: 4 English, 4 Chinese). When toggled ON, show a text Input for free-form voice name entry. Default value: "en-US-AvaNeural" per D-04.
Visual: Custom voice Switch at top of Voice Settings card with flex items-center justify-between layout. Select dropdown below, or Input when custom mode enabled, with placeholder "e.g., en-US-Ava:DragonHDLatestNeural".
Trigger: Admin selects avatar character in Voice & Avatar tab.
Behavior: Two linked Select dropdowns. Character dropdown shows 6 video avatar characters from AVATAR_VIDEO_CHARACTERS constant (harry, jeff, lisa, lori, max, meg). When character changes, style dropdown filters to show only valid styles for that character via useMemo. When "Custom avatar" Switch is ON, character becomes a text Input. Default: character "lori", style "casual" per D-04.
Visual: Side-by-side Select dropdowns using grid grid-cols-2 gap-4. Character label left column, style label right column. Custom avatar Switch below with same flex items-center justify-between layout as custom voice.
Trigger: Admin adjusts conversation parameters in Voice & Avatar tab.
Behavior: Temperature uses Slider (min 0.0, max 1.0, step 0.1, default 0.9). Turn detection uses Select with 4 options from TURN_DETECTION_TYPES constant (server_vad default). Noise suppression, echo cancellation, EOU detection each use Switch (all default OFF per D-04). Recognition language uses Select with options from RECOGNITION_LANGUAGES constant including "Auto Detect" (default "auto").
Visual: Stacked form fields within a Conversation Parameters Card. Fields use space-y-4. Switch rows use flex items-center justify-between. Temperature shows current numeric value to the right of the Slider.
Trigger: Admin views Agent tab.
Behavior: Auto-generated instructions text (built by buildPreviewInstructions() from current form values including name, specialty, personality, objections, expertise) displayed in a disabled Textarea with muted background. Below it, an editable Textarea (agent_instructions_override form field) allows admin to write custom instructions. If override is non-empty, it takes priority when syncing to AI Foundry (checked in backend build_agent_instructions).
Visual: Two Textareas stacked vertically within an Agent Instructions Card. Top one: disabled with bg-muted/50 appearance, rows=6. Bottom one: standard input style, rows=6, placeholder text from admin:hcp.overridePlaceholder.
Trigger: Table renders with HCP profile data.
Behavior: New column after "Agent Status". Shows voice name (shortened via getVoiceLabel() helper -- e.g., "en-US-AvaNeural" becomes "Ava") and avatar character+style combined as two inline Badge elements. If HCP has no voice_name or avatar_character, shows "Not configured" text.
Visual: Two Badge variant="outline" side by side with gap-1 (4px). Both badges use text-xs. Voice badge shows short name (e.g., "Ava"). Avatar badge shows combined character-style (e.g., "Lori-casual"). When not configured: plain text-xs text-muted-foreground text.
Trigger: Voice session starts, token broker response received via useVoiceToken hook.
Behavior: resolveMode(tokenData) determines the best available mode:
- If
tokenData.avatar_enabled && tokenData.agent_id->"digital_human_realtime_agent" - If
tokenData.avatar_enabled->"digital_human_realtime_model" - If
tokenData.agent_id->"voice_realtime_agent" - Otherwise ->
"voice_realtime_model"MR never sees a mode picker. Mode is auto-selected.initialModecaptured viauseReffor degradation detection. Visual: No ModeSelector rendered in voice session. ModeStatusIndicator in header shows resolved mode label.
Trigger: Avatar connection fails during session, or voice connection degrades. Behavior: Three-level fallback: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Each fallback triggers:
-
toast.warning()notification via sonner with descriptive text fromvoice:error.avatarFallbackorvoice:error.voiceFallback -
currentModestate updates to new degraded mode - ModeStatusIndicator updates automatically (dot turns amber when
currentMode !== initialMode) Visual: Toast uses sonner warning styling (amber tint). ModeStatusIndicator badge text updates dynamically to reflect new mode label fromvoice:modeBadge.*.
Trigger: Always visible during voice session in VoiceSessionHeader center.
Behavior: Renders as <Badge variant="outline"> with prepended colored dot. Shows:
- Mode label text from
voice:modeBadge.{currentMode}i18n key (e.g., "Digital Human Agent", "Voice Agent", "Voice Realtime") - Status text from
voice:modeStatus.*(Connected / Degraded / Disconnected) - Format:
"{mode label} - {status text}" - Dot color logic:
isDisconnected->bg-destructive,isDegraded(currentMode !== initialMode) ->bg-amber-500, else ->bg-green-500Visual: Badge withflex items-center gap-2 text-xs font-semibold. Dot issize-2 shrink-0 rounded-full. ARIA:role="status" aria-live="polite"for screen reader announcements.
Trigger: MR starts a voice session for a scenario with an HCP.
Behavior: Frontend passes hcpProfileId to useVoiceToken hook, which passes it to POST /api/v1/voice-live/token endpoint. Backend reads HCP profile voice/avatar settings and returns them in VoiceLiveTokenResponse (voice_name, avatar_character, avatar_style, avatar_customized, voice_temperature, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language). Falls back to global defaults when no HCP profile or on exception. useVoiceLive and useAvatarStream hooks consume per-HCP settings from token response.
Visual: No visible UI change from user perspective. Settings are applied transparently -- MR sees the correct avatar character and hears the correct voice for each HCP.
Trigger: Voice session starts and tokenData.avatar_enabled === true.
Behavior: Voice session page renders the unified avatar+chat layout (L-06). Avatar Display Area shows:
-
Azure AI Avatar video stream —
<video>element connected to avatar WebSocket stream. Avatar lip-syncs with agent TTS output. Video auto-plays, muted (audio comes from TTS stream separately). -
Static image fallback — If avatar video stream fails but avatar_character is configured, show a static avatar image (
<img>from avatar character asset URL). ModeStatusIndicator shows amber "Degraded" state. -
No avatar — If
avatar_enabled === falseor avatar_character not configured, render the standard voice-only layout without avatar area.
Chat panel on the right shows the real-time conversation transcript alongside the avatar. Both update simultaneously — user sees avatar speaking while reading the text.
Visual: Avatar centered in its container with neutral background. Smooth fade-in transition (transition-opacity duration-300) when avatar stream connects. Loading state shows skeleton pulse animation in the avatar area. Chat bubbles: AI messages use bg-card with left alignment, user messages use bg-primary/10 with right alignment.
Constraint: Avatar video must maintain aspect ratio (never stretch/distort). Use object-contain to fit within container bounds.
Trigger: MR clicks End Session button.
Behavior: Dialog confirmation. On confirm: flush pending transcripts via pendingFlushesRef with Promise.all, disconnect voice/avatar, call endSession API, navigate to scoring page.
Visual: Existing Dialog pattern from Phase 08. No changes in Phase 12.
All copy externalized via react-i18next. English (en-US) and Chinese (zh-CN) values verified against actual implementation.
| Element | i18n Key | en-US Copy | zh-CN Copy |
|---|---|---|---|
| Tab: Profile | admin:hcp.tabProfile |
Profile | 基本信息 |
| Tab: Voice & Avatar | admin:hcp.tabVoiceAvatar |
Voice & Avatar | 语音和数字人 |
| Tab: Agent | admin:hcp.tabAgent |
Agent | AI 代理 |
| Voice section title | admin:hcp.voiceSettings |
Voice Settings | 语音设置 |
| Avatar section title | admin:hcp.avatarSettings |
Avatar Settings | 数字人设置 |
| Conversation params title | admin:hcp.conversationParams |
Conversation Parameters | 对话参数 |
| Custom voice toggle | admin:hcp.customVoice |
Custom voice | 自定义语音 |
| Custom avatar toggle | admin:hcp.customAvatar |
Custom avatar | 自定义数字人 |
| Voice name label | admin:hcp.voiceName |
Voice Name | 语音名称 |
| Avatar character label | admin:hcp.avatarCharacter |
Avatar Character | 数字人角色 |
| Avatar style label | admin:hcp.avatarStyle |
Avatar Style | 数字人风格 |
| Temperature label | admin:hcp.temperature |
Temperature | 对话温度 |
| Turn detection label | admin:hcp.turnDetection |
Turn Detection | 轮次检测 |
| Noise suppression label | admin:hcp.noiseSuppression |
Noise Suppression | 噪声抑制 |
| Echo cancellation label | admin:hcp.echoCancellation |
Echo Cancellation | 回声消除 |
| EOU detection label | admin:hcp.eouDetection |
End-of-Utterance Detection | 语音终止检测 |
| Recognition language label | admin:hcp.recognitionLanguage |
Recognition Language | 识别语言 |
| Auto detect option | admin:hcp.autoDetect |
Auto Detect | 自动检测 |
| Agent instructions (auto) | admin:hcp.autoInstructions |
Auto-generated Instructions | 自动生成指令 |
| Agent instructions (override) | admin:hcp.overrideInstructions |
Override Instructions | 自定义指令 |
| Override placeholder | admin:hcp.overridePlaceholder |
Leave empty to use auto-generated instructions | 留空则使用自动生成的指令 |
| Table column header | admin:hcp.voiceAvatarCol |
Voice & Avatar | 语音和数字人 |
| Not configured text | admin:hcp.notConfigured |
Not configured | 未配置 |
| Primary CTA | admin:hcp.save |
Save Profile | 保存配置 |
| Empty state heading | admin:hcp.emptyTitle |
No HCP Profiles | 暂无 HCP 配置 |
| Empty state body | admin:hcp.emptyBody |
Create your first HCP profile to start building training scenarios. | 创建第一个 HCP 配置以开始培训。 |
| Delete with agent | admin:hcp.deleteConfirmWithAgent |
Delete HCP Profile: This will permanently remove this profile, delete its AI Foundry agent, and unassign it from all scenarios. This action cannot be undone. | 删除 HCP 配置:将永久删除此配置、其 AI Foundry Agent 以及所有关联场景分配。此操作不可撤销。 |
| Delete with agent (short) | admin:hcp.deleteConfirmAgent |
Delete this HCP profile? This will also delete the linked AI Foundry Agent. | 确定删除此 HCP 配置?关联的 AI Foundry Agent 也将被删除。 |
| Error: save failed | admin:errors.hcpSaveFailed |
Failed to save HCP profile. Please try again. | 保存 HCP 配置失败,请重试。 |
| Element | i18n Key | en-US Copy | zh-CN Copy |
|---|---|---|---|
| Fallback toast: avatar | voice:error.avatarFallback |
Avatar unavailable, switching to voice mode | 数字人不可用,已切换为语音模式 |
| Fallback toast: voice | voice:error.voiceFallback |
Voice unavailable, switching to text mode | 语音不可用,已切换为文字模式 |
| Mode: connected | voice:modeStatus.connected |
Connected | 已连接 |
| Mode: degraded | voice:modeStatus.degraded |
Degraded | 降级模式 |
| Mode: disconnected | voice:modeStatus.disconnected |
Disconnected | 已断开 |
| Mode badge: text | voice:modeBadge.text |
Text Mode | 文字模式 |
| Mode badge: voice pipeline | voice:modeBadge.voice_pipeline |
Voice Pipeline | 语音管线 |
| Mode badge: DH pipeline | voice:modeBadge.digital_human_pipeline |
Digital Human Pipeline | 数字人管线 |
| Mode badge: voice RT model | voice:modeBadge.voice_realtime_model |
Voice Realtime | 语音实时 |
| Mode badge: DH RT model | voice:modeBadge.digital_human_realtime_model |
Digital Human Realtime | 数字人实时 |
| Mode badge: voice RT agent | voice:modeBadge.voice_realtime_agent |
Voice Agent | 语音代理 |
| Mode badge: DH RT agent | voice:modeBadge.digital_human_realtime_agent |
Digital Human Agent | 数字人代理 |
| Avatar loading | voice:avatar.loading |
Connecting to avatar... | 正在连接数字人... |
| Avatar failed | voice:avatar.failed |
Avatar unavailable | 数字人不可用 |
| Transcript label | voice:transcript |
Transcript | 对话记录 |
| Chat input placeholder | voice:chatPlaceholder |
Type a message or use the mic... | 输入消息或使用麦克风... |
Focal point: Save Profile button (top-right of header bar). Remains visible and accessible regardless of active tab.
+-----------------------------------------------------------+
| [<-] Create/Edit HCP Profile [Test Chat] [Save] | <- Header bar (fixed, outside tabs)
+-----------------------------------------------------------+
| [Profile] [Voice & Avatar] [Agent] | <- TabsList (h-9, bg-muted, rounded-lg)
+-----------------------------------------------------------+
| |
| Tab content area (scrollable, max-w-4xl mx-auto) |
| Cards stacked vertically with space-y-6 (24px gap) |
| |
+-----------------------------------------------------------+
Single <Form> wraps <Tabs>. Previous 3-column grid layout (2-col form + 1-col sidebar) replaced by full-width tabs. Agent status card and timestamps card moved into Agent tab content.
+-----------------------------------------------------------+
| Card: Voice Settings |
| Custom voice: [OFF ----] [Switch] |
| Voice Name: [Select dropdown \/] |
| (or Input if custom voice ON) |
+-----------------------------------------------------------+
| space-y-6 |
+-----------------------------------------------------------+
| Card: Avatar Settings |
| Custom avatar: [OFF ----] [Switch] |
| Character: [Select \/] Style: [Select \/] |
| (grid grid-cols-2 gap-4) |
+-----------------------------------------------------------+
| space-y-6 |
+-----------------------------------------------------------+
| Card: Conversation Parameters |
| Temperature: [=====O=====] 0.9 (Slider + value) |
| Turn Detection: [Select \/] |
| Noise Suppression: [label] [Switch] |
| Echo Cancellation: [label] [Switch] |
| EOU Detection: [label] [Switch] |
| Recognition Language: [Select \/] |
+-----------------------------------------------------------+
Each section is a Card. Fields within cards use space-y-4. Switch rows use flex items-center justify-between. Character and style dropdowns are grid grid-cols-2 gap-4.
+-----------------------------------------------------------+
| Card: Agent Status (bg matches sync status config) |
| [Icon] Status: Synced / Pending / Failed / None |
| Agent ID: asst_xxxxx (Tooltip for full ID) |
| [Retry Sync button] [View in Azure Portal link] |
+-----------------------------------------------------------+
| space-y-6 |
+-----------------------------------------------------------+
| Card: Agent Instructions |
| Auto-generated Instructions (label): |
| [================================] |
| [ You are Dr. Zhang, an... ] (disabled Textarea) |
| [================================] |
| space-y-4 |
| Override Instructions (label): |
| [================================] |
| [ (editable) ] (active Textarea) |
| [================================] |
+-----------------------------------------------------------+
| space-y-6 |
+-----------------------------------------------------------+
| Card: Metadata |
| Created: 2026-04-01 10:00 |
| Last Updated: 2026-04-02 14:30 |
+-----------------------------------------------------------+
Agent status Card uses AGENT_STATUS_CONFIG for dynamic bg + border + icon + color per status value.
Focal point: ModeStatusIndicator (center of header). Communicates live connection state.
+---[ Timer | Scenario Title ]---[ ModeStatusIndicator ]---[ ConnectionStatus | View | End ]---+
| h-16 (64px) |
ModeStatusIndicator replaces the previous static Badge in center position. Format: [dot] {mode label} - {status}. Width auto-fits content. Dot size-2, text text-xs font-semibold, gap gap-2.
Focal point: Avatar video/image (center-left). When HCP has avatar configured, avatar and agent conversation display on the same page.
Condition: Rendered when tokenData.avatar_enabled === true AND avatar connection is active. Falls back to L-04 voice-only layout when avatar is not configured or avatar connection fails.
+---[ Header: Timer | Scenario Title | ModeStatusIndicator | End ]---+ <- h-16
+--------------------------------------------------------------------+
| | | |
| Scenario | Avatar Display Area | Chat / Transcript |
| Panel | (center, flex-1) | Panel |
| (w-64, | | (w-[400px], |
| optional, | +----------------------+ | flex flex-col) |
| collaps- | | | | |
| ible) | | [Avatar Video/Img] | | +---------------+ |
| | | (aspect-[3/4] or | | | Chat messages | |
| | | object-contain, | | | (flex-1, | |
| | | max-h-[70vh], | | | overflow-y- | |
| | | mx-auto) | | | auto) | |
| | | | | +---------------+ |
| | +----------------------+ | | [Input] [Mic] | |
| | | +---------------+ |
+--------------------------------------------------------------------+
Avatar Display Area:
- Container:
flex items-center justify-center bg-neutral-50 dark:bg-neutral-900 rounded-lg overflow-hidden - Video element (when Azure AI Avatar streaming):
<video>tag withautoPlay muted playsInline, sized tomax-h-[70vh] w-auto mx-auto - Static image fallback (when avatar image configured but no video stream):
<img>withobject-contain max-h-[70vh] mx-auto - Empty state (avatar loading): Skeleton with pulsing animation, same aspect ratio
- Background: subtle neutral to frame the avatar cleanly
Chat Panel (right side):
- Shows real-time transcript messages (AI responses + user utterances)
- Messages styled as chat bubbles: AI messages left-aligned (white/card bg), user messages right-aligned (primary/muted bg)
- Text input at bottom with mic button for push-to-talk or toggle
- Panel header: optional "Transcript" label or hidden for clean look
Interaction: Avatar lip-syncs or animates with agent speech. Chat transcript updates simultaneously with text-to-speech output. User can type or speak — both channels active.
Constraint: Avatar area must never overlap or obscure the chat panel. On narrow viewports, chat panel overlays avatar with semi-transparent background or stacks below.
| Name | Specialty | Personality | Comm Style | Agent Status | Voice & Avatar | Actions |
|------|-----------|-------------|------------|--------------|----------------|---------|
| Dr.Z | Oncology | [friendly] | 50 (Ind.) | [Synced] | [Ava][Lori-c] | E R D |
| Dr.L | Hematol. | [skeptical] | 30 (Dir.) | [Failed] | Not configured | E R D |
New column positioned after Agent Status, before Actions. Column width: auto (content-driven). Badge pair uses gap-1 (4px). Both badges variant="outline" with text-xs.
| State | Type | Location | Purpose |
|---|---|---|---|
| HCP form values (all tabs) |
useForm<HcpFormValues> (react-hook-form + zod) |
hcp-profile-editor.tsx |
Single form instance across Profile / Voice & Avatar / Agent tabs. Zod schema includes all 13 voice/avatar fields. Prevents data loss on tab switch. |
| Active tab | Radix Tabs internal (defaultValue="profile") |
hcp-profile-editor.tsx |
Uncontrolled. No external state needed. |
| Current session mode | useState<SessionMode> |
voice-session.tsx |
Auto-resolved from token broker via resolveMode(), updated on fallback chain trigger. |
| Initial session mode | useRef<SessionMode> |
voice-session.tsx |
Captured at session start. Used by ModeStatusIndicator to detect degradation (initial vs current). |
| Token broker response | TanStack Query mutation via useVoiceToken
|
use-voice-token.ts |
Extended to pass hcpProfileId. Returns all per-HCP voice/avatar/conversation params in VoiceLiveToken. |
| Avatar style options |
useMemo derived from selected character |
voice-avatar-tab.tsx |
Filters AVATAR_VIDEO_CHARACTERS styles when character selection changes. |
| Requirement | Implementation |
|---|---|
| Tab keyboard navigation | Radix Tabs handles Arrow key navigation, Enter/Space activation automatically |
| Form labels | All voice/avatar fields use FormLabel via react-hook-form FormField pattern. Switch labels use companion <Label htmlFor>
|
| Switch ARIA | Each Switch has adjacent label text and aria-checked state (Radix default) |
| Mode status announcements | Badge uses role="status" and aria-live="polite" to announce mode changes to screen readers |
| Color not sole indicator | Mode status uses text label ("Connected" / "Degraded" / "Disconnected") alongside colored dot. Agent status uses icon + text label alongside background color |
| Fallback toast | sonner toasts include descriptive text, not just color. Warning level provides distinct styling |
| Tooltip for truncated content | Agent ID shown in Tooltip when truncated in Agent tab |
| Registry | Blocks Used | Safety Gate |
|---|---|---|
| shadcn official | Not applicable (components already installed manually as Radix wrappers) | not required |
| Third-party | none | not applicable |
No new component installations needed. All required UI primitives (Tabs, Select, Switch, Slider, Badge, Card, Form, Input, Textarea, Dialog, Tooltip, Button, Label) are already present in frontend/src/components/ui/.
| Breakpoint | HCP Editor | Voice Session (with Avatar) | Voice Session (no Avatar) | HCP Table |
|---|---|---|---|---|
| Desktop (>=1024px) | Tabs full-width, max-w-4xl mx-auto, all cards visible |
3-panel: Scenario sidebar (w-64, collapsible) + Avatar center (flex-1) + Chat right (w-[400px]). Avatar video fills center with max-h-[70vh]
|
3-panel layout (existing ScenarioPanel + center + HintsPanel) | All 7 columns visible |
| Tablet (768-1023px) | Same as desktop, narrower content area | 2-panel: Avatar top (50vh) + Chat bottom (50vh). Scenario sidebar hidden (accessible via hamburger). Avatar scales down proportionally | 3-panel stacks vertically (existing lg:flex-row pattern) |
Hide Comm Style column; Voice & Avatar badges stack vertically |
| Mobile (<768px) | Full-width tabs, cards stack, grid-cols-2 for avatar character/style collapses to grid-cols-1 | Chat panel overlays avatar with semi-transparent bg + drag handle to resize. Or tab toggle: [Avatar] [Chat] tabs at bottom. Mic button always visible as floating action | Single panel with collapsible side panels (existing pattern) | Horizontal scroll on table, or hide Voice & Avatar and Comm Style columns |
- Dimension 1 Copywriting: PASS
- Dimension 2 Visuals: PASS
- Dimension 3 Color: PASS
- Dimension 4 Typography: PASS (FLAG resolved — added text-xs to table)
- Dimension 5 Spacing: PASS
- Dimension 6 Registry Safety: PASS
Approval: APPROVED (2026-04-02)
Click to expand verification report
Phase Goal: Each HCP profile becomes a complete "digital persona" with per-HCP voice, avatar, and conversation parameters. The token broker returns all settings in one response. MRs get automatic mode selection (Digital Human Realtime Agent as default) with graceful fallback to voice-only or text. Admin configures HCP digital personas via a tabbed editor.
Verified: 2026-04-02T14:15:00Z Status: passed Re-verification: Yes -- after gap closure (commit 8126313 fixed voice-session.test.tsx)
| # | Truth | Status | Evidence |
|---|---|---|---|
| 1 | Admin can configure per-HCP voice settings, avatar settings, and conversation parameters via tabbed HCP editor | VERIFIED |
voice-avatar-tab.tsx (438 lines): 3 Cards (Voice Settings, Avatar Settings, Conversation Parameters) with Select dropdowns for voice name (8 options), avatar character (6 options) with dynamic style filtering, temperature Slider, 3 Switch controls (noise suppression, echo cancellation, EOU detection), turn detection Select, recognition language Select. hcp-profile-editor.tsx imports and renders VoiceAvatarTab in TabsContent. |
| 2 | Token broker returns all per-HCP voice/avatar settings when hcp_profile_id is provided, falls back to global defaults when not | VERIFIED |
voice_live_service.py lines 82-106: sources all 13 fields from profile.voice_name, profile.avatar_character, etc. when hcp_profile_id provided. Lines 65-79: initializes defaults before the if-block. Lines 108-130: returns all fields in VoiceLiveTokenResponse. |
| 3 | New HCPs get smart defaults (voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD) without manual configuration | VERIFIED |
hcp_profile.py model defaults: voice_name="en-US-AvaNeural", avatar_character="lori", avatar_style="casual", voice_temperature=0.9, turn_detection_type="server_vad". Migration i12b has matching server_default on all 13 columns. |
| 4 | MR does NOT see a mode picker -- system auto-selects best mode based on HCP config and service availability | VERIFIED |
voice-session.tsx: resolveMode(tokenData) function at line 49 derives mode from avatar_enabled and agent_id. No ModeSelector import or render found. Props interface uses hcpProfileId: string, not mode: SessionMode. |
| 5 | Fallback chain works: Digital Human Realtime Agent -> Voice-only Realtime -> Text, with toast notification and persistent mode status indicator | VERIFIED |
voice-session.tsx: avatar connect failure triggers toast.warning(t("error.avatarFallback")) (line 193) and falls back to voice-only. Voice connection failure triggers toast.warning(t("error.voiceFallback")) (lines 142, 210) and falls back to text. mode-status-indicator.tsx: green/amber/red dot with role="status" and aria-live="polite". |
| 6 | HCP table shows Voice & Avatar column with badge pair showing per-HCP configuration | VERIFIED |
hcp-table.tsx: column header t("hcp.voiceAvatarCol") at line 181. Cell renders two Badge elements with getVoiceLabel(profile.voice_name) and profile.avatar_character-profile.avatar_style. |
| 7 | Agent instructions support admin override via Agent tab (D-02) | VERIFIED |
agent-tab.tsx: disabled Textarea showing buildPreviewInstructions() auto-generated preview, editable Textarea for agent_instructions_override with i18n placeholder. Backend agent_sync_service.py: checks override first, returns trimmed text if non-empty. 5 dedicated override tests pass. |
| 8 | All new UI text externalized to i18n in both en-US and zh-CN | VERIFIED |
admin.json (en-US): 21+ keys including tabProfile, tabVoiceAvatar, tabAgent, voiceSettings, avatarSettings, voiceAvatarCol, notConfigured. admin.json (zh-CN): matching keys with Chinese translations. voice.json (en-US): modeStatus.connected/degraded/disconnected, error.avatarFallback/voiceFallback. voice.json (zh-CN): matching Chinese translations. |
Score: 8/8 truths verified
| Artifact | Expected | Status | Details |
|---|---|---|---|
backend/alembic/versions/i12b_add_voice_avatar_fields_to_hcp_profile.py |
Migration adding 13 columns | VERIFIED | 13 add_column calls with server_default on all, batch_alter_table for SQLite compat |
backend/app/models/hcp_profile.py |
ORM model with voice/avatar columns | VERIFIED | 13 new Mapped columns (voice_name, voice_type, voice_temperature, voice_custom, avatar_character, avatar_style, avatar_customized, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language, agent_instructions_override) |
backend/app/schemas/hcp_profile.py |
Extended Pydantic schemas | VERIFIED | HcpProfileCreate, HcpProfileUpdate, HcpProfileResponse all include 13 voice/avatar fields |
backend/app/schemas/voice_live.py |
VoiceLiveTokenResponse with per-HCP fields | VERIFIED | 11 per-HCP fields added |
backend/app/services/voice_live_service.py |
Token broker with per-HCP sourcing | VERIFIED | Sources all fields from profile when hcp_profile_id provided, falls back to defaults |
backend/app/api/voice_live.py |
Endpoint with hcp_profile_id query param | VERIFIED | `hcp_profile_id: str |
backend/app/services/agent_sync_service.py |
Agent instructions override (D-02) | VERIFIED |
build_agent_instructions checks override first, returns trimmed text if non-empty |
backend/app/api/hcp_profiles.py |
HcpProfileOut with voice/avatar fields | VERIFIED | 13 voice/avatar fields added to HcpProfileOut response model |
frontend/src/types/hcp.ts |
Extended TypeScript types | VERIFIED | HcpProfile has 13 voice/avatar fields, HcpProfileCreate has all optional |
frontend/src/types/voice-live.ts |
VoiceLiveToken with per-HCP fields | VERIFIED | 11 per-HCP optional fields added |
frontend/src/api/voice-live.ts |
API client with hcpProfileId | VERIFIED |
fetchVoiceLiveToken(hcpProfileId?: string) passes as query param |
frontend/src/hooks/use-voice-token.ts |
Mutation accepts hcpProfileId | VERIFIED | `useMutation<VoiceLiveToken, Error, string |
frontend/src/components/admin/voice-avatar-tab.tsx |
Voice & Avatar tab component | VERIFIED | 438 lines, 3 Cards, all form fields wired to react-hook-form |
frontend/src/components/admin/agent-tab.tsx |
Agent tab component | VERIFIED | 281 lines, AGENT_STATUS_CONFIG, preview + override textareas, metadata card |
frontend/src/pages/admin/hcp-profile-editor.tsx |
Tabbed HCP editor | VERIFIED | 3 TabsTrigger values (profile, voice-avatar, agent), imports VoiceAvatarTab + AgentTab |
frontend/src/components/admin/hcp-table.tsx |
HCP table with Voice+Avatar column | VERIFIED | voiceAvatarCol header, Badge pair display |
frontend/src/components/voice/mode-status-indicator.tsx |
Mode status badge | VERIFIED | Green/amber/red dot, i18n labels, role="status", aria-live="polite" |
frontend/src/components/voice/voice-session.tsx |
Auto-mode + fallback chain | VERIFIED | resolveMode function, hcpProfileId prop (no mode prop), fallback with toast warnings |
frontend/src/components/voice/voice-session-header.tsx |
Header with ModeStatusIndicator | VERIFIED | currentMode/initialMode props, ModeStatusIndicator rendered |
frontend/src/hooks/use-voice-live.ts |
Per-HCP session config | VERIFIED | Uses tokenData.voice_temperature, turn_detection_type, noise_suppression, avatar_style |
frontend/src/pages/user/voice-session.tsx |
Page passes hcpProfileId | VERIFIED |
hcpProfileId={hcpProfileId} from scenario |
backend/tests/test_voice_live_per_hcp.py |
Per-HCP token broker tests | VERIFIED | 8 tests passing |
backend/tests/test_hcp_profile_voice.py |
HCP CRUD voice field tests | VERIFIED | 10 tests passing |
backend/tests/test_agent_sync_service.py |
Agent instruction override tests | VERIFIED | 5 new override tests passing (27 total in file) |
backend/scripts/seed_phase2.py |
Seed data with voice/avatar configs | VERIFIED | 5 HCP profiles with distinct voice_name and avatar_character values |
frontend/src/components/voice/voice-session.test.tsx |
Updated test for new props | VERIFIED | Uses hcpProfileId: "hcp-1" prop (line 277). No stale mode prop references. tsc -b passes cleanly with 0 errors. |
| From | To | Via | Status | Details |
|---|---|---|---|---|
voice_live.py (API) |
voice_live_service.py |
hcp_profile_id pass-through |
WIRED | hcp_profile_id=hcp_profile_id |
voice_live_service.py |
hcp_profile.py (model) |
Lazy import hcp_profile_service | WIRED | from app.services import hcp_profile_service; profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id) |
hcp-profile-editor.tsx |
voice-avatar-tab.tsx |
Import and render in TabsContent | WIRED | Import + <VoiceAvatarTab form={form} />
|
hcp-profile-editor.tsx |
agent-tab.tsx |
Import and render in TabsContent | WIRED | Import + <AgentTab ...>
|
voice-live.ts (API) |
Backend POST /voice-live/token | hcp_profile_id query param | WIRED | params = hcpProfileId ? { hcp_profile_id: hcpProfileId } : {} |
voice-session-page.tsx |
voice-session.tsx |
hcpProfileId prop | WIRED | hcpProfileId={hcpProfileId} |
voice-session.tsx |
use-voice-token.ts |
mutateAsync(hcpProfileId) |
WIRED | tokenMutation.mutateAsync(hcpProfileId) |
use-voice-live.ts |
VoiceLiveToken per-HCP fields | Session config from tokenData | WIRED |
tokenData.voice_temperature, tokenData.turn_detection_type, tokenData.noise_suppression, tokenData.avatar_style confirmed |
| Artifact | Data Variable | Source | Produces Real Data | Status |
|---|---|---|---|---|
voice-avatar-tab.tsx |
form (UseFormReturn) | Parent hcp-profile-editor.tsx react-hook-form |
Yes - populated from HCP profile API response via useQuery | FLOWING |
agent-tab.tsx |
form + profile | Parent form + useQuery HCP profile | Yes - profile from API, form from react-hook-form | FLOWING |
mode-status-indicator.tsx |
currentMode, initialMode, connectionState | Props from voice-session.tsx state | Yes - derived from token broker response via resolveMode() | FLOWING |
hcp-table.tsx |
profile.voice_name, avatar_character | HCP profiles from useHcpProfiles query | Yes - DB-backed via API | FLOWING |
voice-session.tsx |
tokenData | tokenMutation.mutateAsync(hcpProfileId) | Yes - token broker API call | FLOWING |
| Behavior | Command | Result | Status |
|---|---|---|---|
| Frontend tsc -b (gap fix) | npx tsc -b --noEmit |
0 errors, clean exit | PASS |
| Frontend Vite build | npm run build |
Built in 4.46s, dist/ output generated | PASS |
| Backend tests (45 total) | pytest tests/test_voice_live_per_hcp.py tests/test_hcp_profile_voice.py tests/test_agent_sync_service.py -x -v |
45 passed in 34.50s | PASS |
| Test file uses hcpProfileId prop | grep for hcpProfileId in test |
Line 277: hcpProfileId: "hcp-1"
|
PASS |
| Test file has no stale mode prop | grep for mode: in test |
Only mode: "f2f" in mockScenarioData (Scenario type, not VoiceSessionProps) |
PASS |
| Requirement | Source Plan | Description | Status | Evidence |
|---|---|---|---|---|
| VOICE-12-01 | 12-01 | Per-HCP digital persona model (voice/avatar columns) | SATISFIED | 13 columns on HcpProfile model with ORM + Pydantic + migration |
| VOICE-12-02 | 12-01 | Token broker per-HCP wiring | SATISFIED | voice_live_service sources all fields from HCP profile |
| VOICE-12-03 | 12-02 | Admin tabbed HCP editor with Voice & Avatar tab | SATISFIED | 3-tab layout with VoiceAvatarTab and AgentTab components |
| VOICE-12-04 | 12-03 | Auto-mode resolution (no manual mode picker) | SATISFIED | resolveMode() function, hcpProfileId prop replaces mode |
| VOICE-12-05 | 12-02 | HCP table Voice+Avatar column, i18n | SATISFIED | Badge pair display, 21+ i18n keys in both locales |
| VOICE-12-06 | 12-03 | Fallback chain with toast notifications and ModeStatusIndicator | SATISFIED | 3-level fallback with toast.warning, green/amber/red indicator |
Note: VOICE-12-01 through VOICE-12-06 are referenced in ROADMAP.md but NOT formally defined in REQUIREMENTS.md. They are phase-specific IDs created for Phase 12. No orphaned requirements exist -- REQUIREMENTS.md maps no additional IDs to Phase 12.
| File | Line | Pattern | Severity | Impact |
|---|---|---|---|---|
voice_live_service.py |
105-106 |
except Exception: pass (silent fallback) |
Info | Intentional design: falls back to defaults when HCP profile lookup fails. Prevents service outage from profile issues. |
Test: Open HCP editor, fill in Profile tab fields, switch to Voice & Avatar tab, configure voice/avatar settings, switch to Agent tab, verify override textarea works, switch back to Profile tab. Expected: All form data persists across tab switches. No data loss. Why human: Cross-tab form state persistence requires interactive browser testing.
Test: In Voice & Avatar tab, change avatar character dropdown from "lori" to "lisa". Check if style dropdown options update dynamically. Expected: Style options change to lisa-specific styles (casual-sitting, graceful-sitting, etc.). Previously selected style resets to first available. Why human: Dynamic dropdown filtering requires visual interaction.
Test: Start a voice session, observe the ModeStatusIndicator badge color and text during connection, degradation (if simulated), and disconnection. Expected: Green dot + "Connected" when at optimal mode, amber dot + "Degraded" when fallen back, red dot + "Disconnected" on error. Why human: Real-time visual state changes during live WebSocket/Avatar connections.
Test: Start a voice session where avatar service is unavailable but voice works. Then start one where voice is also unavailable. Expected: First scenario: toast warning "Avatar unavailable, switching to voice mode". Second: toast warning "Voice unavailable, switching to text mode". Why human: Requires simulating service unavailability with real Azure connections.
Test: View HCP list page with multiple profiles that have different voice/avatar configurations. Expected: Badge pairs show short voice label (e.g., "Ava", "Yunxi") and avatar character-style (e.g., "lori-casual"). Profiles without config show "Not configured". Why human: Visual layout, badge rendering, and label formatting need visual confirmation.
Previous gap: voice-session.test.tsx referenced the removed mode prop from pre-Phase 12-03 VoiceSessionProps interface, producing 12 TypeScript TS2353 errors. tsc -b failed across the full frontend project.
Fix: Commit 8126313 ("fix(12): update voice-session.test.tsx for auto-mode props (mode -> hcpProfileId)") updated the test file to:
- Replace
mode: "voice_pipeline"prop withhcpProfileId: "hcp-1"indefaultProps(line 277) - Update mock VoiceSessionHeader to check the new props pattern
- Remove all references to the removed
modeprop onVoiceSessionProps
Verification of fix:
-
npx tsc -b --noEmitnow completes with 0 errors - grep confirms no stale
modeprop references in test (onlymode: "f2f"inmockScenarioDatawhich is the Scenario type field, not VoiceSessionProps) -
npm run buildsucceeds in 4.46s
Regression check: All 8 previously-verified truths remain verified. All artifacts remain present and substantive. No regressions detected.
Verified: 2026-04-02T14:15:00Z Verifier: Claude (gsd-verifier)