Planning Phase 12 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 12: Voice Realtime Api Agent

Auto-generated from .planning/phases/12-voice-realtime-api-agent
Last synced: 2026-04-28

Context & Decisions

Phase 12: Voice Realtime API & Agent Mode Integration - Context

Gathered: 2026-04-02 Status: Ready for planning

## Phase Boundary

Extend HCP profiles to be complete "digital persona" configurations — each HCP stores Voice Live API settings (voice name, conversation parameters) and Avatar settings (character, custom avatar) alongside the existing AI Foundry Agent. When an MR selects an HCP and starts a session, the system auto-configures the voice connection with per-HCP settings and defaults to Digital Human Realtime Agent mode with automatic fallback to voice-only or text.

## Implementation Decisions

HCP Voice/Avatar Configuration Scope

  • D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
  • D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
  • D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with customized: true toggle) — matching reference repo pattern
  • D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP

Admin UX — HCP Editor Redesign

  • D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
  • D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
  • D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column

Session Wiring

  • D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
  • D-09: MR cannot override HCP voice/avatar settings during a session — settings are locked per-HCP for consistent experience

Mode Simplification & Fallback

  • D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker — system auto-selects based on HCP config and service availability
  • D-11: Fallback chain: Digital Human Realtime Agent → Voice-only Realtime → Text mode. Triggered when avatar service unavailable or network degraded
  • D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session

Claude's Discretion

  • Exact DB column types and migration details for new HCP voice/avatar fields
  • Default avatar/voice options list (can derive from Azure documentation)
  • Tab component implementation details (reuse existing Tabs from UI library)
  • WebSocket reconnection strategy on network recovery
  • Status indicator component design

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Reference Implementation

  • User's screenshot of Voice Live Agent demo — shows full settings panel (Instructions, Connection Settings, Conversation Settings, Voice, Avatar) with Digital Human avatar rendering and chat

HCP Profile Model & API (Phase 11 output)

  • backend/app/models/hcp_profile.py — HcpProfile ORM model (extend with voice/avatar fields)
  • backend/app/schemas/hcp_profile.py — HcpProfileCreate/Update/Response schemas (extend)
  • backend/app/api/hcp_profiles.py — HCP profile CRUD router
  • backend/app/services/hcp_profile_service.py — HCP profile service layer
  • backend/app/services/agent_sync_service.py — Agent sync (extend to sync voice/avatar config)

Voice Live Infrastructure (Phase 08/09 output)

  • backend/app/services/voice_live_service.py — Token broker (extend to return per-HCP voice/avatar settings)
  • backend/app/schemas/voice_live.py — VoiceLiveTokenResponse (extend with voice/avatar fields)
  • backend/app/services/agents/adapters/azure_voice_live.py — Agent/Model mode parse/encode
  • backend/app/api/voice_live.py — Voice Live API routes

Frontend Voice Components (Phase 08 output)

  • frontend/src/hooks/use-voice-live.ts — RTClient WebSocket hook (consume per-HCP settings)
  • frontend/src/hooks/use-avatar-stream.ts — Avatar WebRTC hook (consume per-HCP avatar config)
  • frontend/src/components/voice/voice-session.tsx — VoiceSession container
  • frontend/src/components/voice/mode-selector.tsx — Current mode selector (replace with auto-mode + fallback)
  • frontend/src/components/voice/avatar-view.tsx — Avatar renderer

Frontend Admin (Phase 11 output)

  • frontend/src/pages/admin/hcp-profiles.tsx — HCP profiles admin page (add tabs)
  • frontend/src/pages/admin/hcp-profile-editor.tsx — HCP editor (extend with tabs)
  • frontend/src/components/admin/hcp-table.tsx — HCP table (add Voice+Avatar column)
  • frontend/src/types/hcp.ts — HCP TypeScript types (extend)

Config & Auth

  • backend/app/services/config_service.py — AI Foundry unified config
  • backend/app/services/connection_tester.py — Connection testing patterns

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • HcpProfile model already has agent_id, agent_sync_status fields from Phase 11 — extend with voice/avatar columns
  • agent_sync_service.py — Pattern for auto-syncing on HCP CRUD, reuse for voice/avatar validation
  • VoiceLiveTokenResponse — Already returns endpoint, api_key, agent_id — extend with voice/avatar settings
  • Tabs component in UI library — reuse for HCP editor tabbed layout
  • useVoiceLive hook — Already handles WebSocket connection, needs to accept per-HCP conversation params
  • useAvatarStream hook — Already handles WebRTC, needs to accept per-HCP avatar character
  • mode-selector.tsx — Has the 7-mode mapping, will be replaced by auto-mode logic

Established Patterns

  • Per-domain TanStack Query hooks with mutation invalidation
  • Alembic migration with server_default for SQLite compatibility
  • i18n namespaces per domain (admin, voice)
  • Token broker pattern: backend generates config, frontend consumes directly
  • Full-screen session pages without UserLayout

Integration Points

  • HcpProfile model → add ~12 new columns for voice/avatar settings
  • Token broker → extend response to include all voice/avatar params from HCP
  • VoiceSession container → consume per-HCP settings instead of global config
  • Mode selector → replace with auto-mode + fallback chain logic
  • HCP editor page → add tabbed layout with Voice & Avatar tab
  • HCP table → add Voice+Avatar column

</code_context>

## Specific Ideas
  • Reference implementation screenshot shows the exact settings panel: Instructions, Connection Settings, Conversation Settings (Recognition Language, Noise suppression, Echo cancellation, Turn detection, EOU detection, Temperature), Voice (custom voice toggle, voice name), Avatar (toggle, custom avatar toggle, character)
  • Each HCP becomes a complete "digital persona" — personality + voice + appearance
  • Smart defaults mean new HCPs work immediately for demo without manual configuration
  • Fallback chain matches the user's note: "voice+avatar as default, fallback to voice or text if service unavailable or network bad"
  • Token broker is the single integration point — frontend gets everything it needs in one call
## Deferred Ideas
  • Developer mode toggle for MRs to override HCP settings during debug sessions — future enhancement
  • Per-session provider override — always use HCP-level config for now
  • Azure AD token auth (DefaultAzureCredential) for Entra token acquisition — future phase
  • Multiple avatar characters per HCP (wardrobe selection) — future enhancement
  • Voice cloning / custom neural voice training — future phase

Phase: 12-voice-realtime-api-agent Context gathered: 2026-04-02

Plans (4)

# Plan File Status
12-01 12-01-PLAN.md Complete
12-02 12-02-PLAN.md Complete
12-03 12-03-PLAN.md Complete
12-04 12-04-PLAN.md Complete

Research

Click to expand research notes

Phase 12: Voice Realtime API & Agent Mode Integration - Research

Researched: 2026-04-02 Domain: HCP digital persona configuration, Voice Live API session wiring, auto-mode + fallback chain Confidence: HIGH

Summary

Phase 12 extends HCP profiles into complete "digital persona" configurations that bundle voice, avatar, and conversation parameters alongside the existing AI Foundry Agent. The token broker API becomes the single integration point: it reads all per-HCP settings and returns them to the frontend, which auto-configures WebSocket and Avatar connections without manual mode selection. The fallback chain (Digital Human Realtime Agent -> Voice-only Realtime -> Text) replaces the current 7-mode ModeSelector with automatic degradation.

The codebase is well-structured for this extension. The HcpProfile ORM model needs ~12 new columns for voice/avatar settings. The VoiceLiveTokenResponse schema already returns voice_name, avatar_character, and agent_id -- these just need to be sourced from HCP profile data instead of global config. The frontend VoiceSession container already implements a basic fallback chain (avatar failure -> voice-only -> text); it needs refinement to consume per-HCP settings from the token broker and display a persistent mode status indicator.

Primary recommendation: Work bottom-up: database migration first, then backend schema/service extension, then frontend HCP editor tabs, then session wiring with auto-mode + fallback, then integration testing.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

  • D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
  • D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
  • D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with customized: true toggle) -- matching reference repo pattern
  • D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP
  • D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
  • D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
  • D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column
  • D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
  • D-09: MR cannot override HCP voice/avatar settings during a session -- settings are locked per-HCP for consistent experience
  • D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker -- system auto-selects based on HCP config and service availability
  • D-11: Fallback chain: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Triggered when avatar service unavailable or network degraded
  • D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session

Claude's Discretion

  • Exact DB column types and migration details for new HCP voice/avatar fields
  • Default avatar/voice options list (can derive from Azure documentation)
  • Tab component implementation details (reuse existing Tabs from UI library)
  • WebSocket reconnection strategy on network recovery
  • Status indicator component design

Deferred Ideas (OUT OF SCOPE)

  • Developer mode toggle for MRs to override HCP settings during debug sessions
  • Per-session provider override -- always use HCP-level config for now
  • Azure AD token auth (DefaultAzureCredential) for Entra token acquisition
  • Multiple avatar characters per HCP (wardrobe selection)
  • Voice cloning / custom neural voice training </user_constraints>

Standard Stack

Core

Library Version Purpose Why Standard
SQLAlchemy 2.0 (async) >=2.0.0 ORM model extension for voice/avatar fields Already in use, async throughout
Alembic >=1.13.0 Database migration for new columns Required by project rules
Pydantic v2 >=2.0.0 Schema extension for voice/avatar fields Already in use for all schemas
@radix-ui/react-tabs (via project UI lib) Tabbed HCP editor layout Already available as Tabs component
react-hook-form + zod (via project) Form validation for voice/avatar settings tab Already used in HCP editor
rt-client 0.5.2 Voice Live WebSocket connection Already installed from reference repo

Supporting

Library Version Purpose When to Use
sonner (via project) Toast notifications for fallback alerts Fallback chain notifications
lucide-react >=0.460.0 Icons for mode status indicator Status indicator component

Alternatives Considered

None -- this phase extends existing infrastructure, not introducing new libraries.

Architecture Patterns

Recommended Project Structure

New/modified files organized by domain:

backend/
  alembic/versions/
    i12a_add_voice_avatar_fields_to_hcp_profile.py   # NEW: migration
  app/
    models/hcp_profile.py                              # EXTEND: ~12 new columns
    schemas/hcp_profile.py                             # EXTEND: voice/avatar fields
    schemas/voice_live.py                              # EXTEND: per-HCP fields in response
    services/voice_live_service.py                     # EXTEND: source settings from HCP
    services/hcp_profile_service.py                    # EXTEND: handle voice/avatar in CRUD
    api/voice_live.py                                  # EXTEND: accept hcp_profile_id param

frontend/
  src/
    types/hcp.ts                                       # EXTEND: voice/avatar fields
    types/voice-live.ts                                # EXTEND: new token response fields
    pages/admin/hcp-profile-editor.tsx                 # REWRITE: tabbed layout
    components/admin/hcp-table.tsx                     # EXTEND: Voice+Avatar column
    components/admin/voice-avatar-tab.tsx              # NEW: Voice & Avatar settings tab
    components/admin/agent-tab.tsx                     # NEW: Agent instructions tab
    components/voice/voice-session.tsx                 # EXTEND: auto-mode + per-HCP config
    components/voice/mode-status-indicator.tsx         # NEW: persistent mode badge
    components/voice/mode-selector.tsx                 # REMOVE: no longer needed (auto-mode)
    hooks/use-voice-token.ts                           # EXTEND: pass hcp_profile_id
    api/voice-live.ts                                  # EXTEND: pass hcp_profile_id to token

Pattern 1: Per-HCP Token Broker Extension

What: Token broker reads HCP profile to source voice/avatar/conversation settings instead of global config. When to use: Every voice session start. Example:

# Source: existing voice_live_service.py pattern, extended per D-08
async def get_voice_live_token(
    db: AsyncSession,
    hcp_profile_id: str | None = None,
) -> VoiceLiveTokenResponse:
    # ... existing config fetch ...

    # Source voice/avatar from HCP profile (D-08)
    if hcp_profile_id:
        profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id)
        voice_name = profile.voice_name or "en-US-AvaNeural"
        avatar_character = profile.avatar_character or "lori"
        avatar_style = profile.avatar_style or "casual"
        avatar_customized = profile.avatar_customized
        temperature = profile.voice_temperature or 0.9
        # ... etc for all conversation params

    return VoiceLiveTokenResponse(
        # ... existing fields ...
        voice_name=voice_name,
        avatar_character=avatar_character,
        avatar_style=avatar_style,
        avatar_customized=avatar_customized,
        temperature=temperature,
        turn_detection_type=turn_detection_type,
        noise_suppression=noise_suppression,
        echo_cancellation=echo_cancellation,
        eou_detection=eou_detection,
        recognition_language=recognition_language,
    )

Pattern 2: Auto-Mode with Fallback Chain (D-10, D-11)

What: Frontend automatically selects the best mode based on HCP config and service availability. No ModeSelector exposed to MR. When to use: Session initialization in VoiceSession container. Example:

// Source: existing voice-session.tsx fallback pattern, refined per D-10/D-11
const resolveMode = (tokenData: VoiceLiveToken): SessionMode => {
  // D-10: Default to Digital Human Realtime Agent (best experience)
  if (tokenData.avatar_enabled && tokenData.agent_id) {
    return "digital_human_realtime_agent";
  }
  if (tokenData.avatar_enabled) {
    return "digital_human_realtime_model";
  }
  if (tokenData.agent_id) {
    return "voice_realtime_agent";
  }
  return "voice_realtime_model";
};

// D-11: Fallback chain on connection failure
// Avatar fails -> voice-only; Voice fails -> text

Pattern 3: Tabbed HCP Editor (D-05)

What: Replace current single-page editor with 3-tab layout using existing Radix Tabs. When to use: HCP profile create/edit page. Example:

// Source: existing Tabs component from @/components/ui/tabs
<Tabs defaultValue="profile">
  <TabsList>
    <TabsTrigger value="profile">Profile</TabsTrigger>
    <TabsTrigger value="voice-avatar">Voice & Avatar</TabsTrigger>
    <TabsTrigger value="agent">Agent</TabsTrigger>
  </TabsList>
  <TabsContent value="profile">
    {/* Existing personality/specialty/objections fields */}
  </TabsContent>
  <TabsContent value="voice-avatar">
    <VoiceAvatarTab form={form} />
  </TabsContent>
  <TabsContent value="agent">
    <AgentTab profile={profile} onRetrySync={handleRetrySync} />
  </TabsContent>
</Tabs>

Anti-Patterns to Avoid

  • Exposing mode picker to MR (D-09/D-10): MRs must NOT manually select voice/avatar modes. System auto-selects.
  • Global voice/avatar config fallback: Always source from HCP profile. Only fall back to global defaults when HCP has no configuration.
  • Mixing tab state with form state: All voice/avatar fields must be part of the single react-hook-form instance, not separate state.
  • Storing avatar settings in a separate table: Keep all HCP digital persona fields in the same hcp_profiles table -- simpler queries, no joins needed.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Tabbed layout Custom tab switching logic Radix Tabs (@/components/ui/tabs) Already in UI library, accessible, keyboard-navigable
Avatar character list Hardcoded constants Azure standard avatars list from docs Authoritative source, characters updated by Microsoft
Form validation for new fields Manual validation in handlers zod schema extension in existing HCP form Already established pattern in hcp-profile-editor.tsx
WebSocket session config Manual JSON construction Extend existing useVoiceLive hook Hook already builds session config from tokenData
Persistent mode indicator Custom status component Badge + cn() from existing UI primitives Consistent with existing badge patterns in the project

Common Pitfalls

Pitfall 1: SQLite batch_alter_table Required for Adding Columns

What goes wrong: Alembic op.add_column() fails on SQLite for certain operations. Why it happens: SQLite doesn't fully support ALTER TABLE. The project already uses batch operations. How to avoid: Use with op.batch_alter_table("hcp_profiles") as batch_op: for all column additions, with server_default on every column. Warning signs: Migration fails locally but would work on PostgreSQL.

Pitfall 2: Token Broker Must Pass hcp_profile_id from Frontend

What goes wrong: Token broker returns global config instead of per-HCP settings because hcp_profile_id is not passed. Why it happens: The current POST /voice-live/token endpoint doesn't accept hcp_profile_id. The voice session page gets session data which includes scenario_id, and scenario has hcp_profile_id. How to avoid: Extend the token endpoint to accept hcp_profile_id as a query parameter or request body field. Wire it through from VoiceSessionPage -> useVoiceToken -> fetchVoiceLiveToken -> API. Warning signs: All HCPs use the same voice/avatar during sessions.

Pitfall 3: Avatar Character vs Style are Separate Fields

What goes wrong: Avatar character and style are concatenated or confused (e.g., "lori-casual" vs character="lori" style="casual"). Why it happens: Azure Avatar API requires character and style as separate fields in the session config JSON. The reference screenshots show them combined in UI display. How to avoid: Store avatar_character and avatar_style as separate DB columns. Display combined in table badges. Send separate in WebSocket session config. Warning signs: Avatar fails to render because character name includes the style suffix.

Pitfall 4: Form Reset on Tab Switch Loses Unsaved Changes

What goes wrong: Switching tabs resets form fields if each tab has its own form state. Why it happens: Multiple form instances or conditional rendering that unmounts tab content. How to avoid: Use a single react-hook-form instance that spans all tabs. Radix Tabs renders all TabsContent in DOM by default (just hidden), so form state persists across tab switches. Warning signs: Admin fills voice settings, switches to Profile tab, switches back, and settings are gone.

Pitfall 5: Lazy Import for hcp_profile_service in voice_live_service

What goes wrong: Circular import error when voice_live_service imports hcp_profile_service at module level. Why it happens: Already documented as Phase 11 decision -- voice_live_service uses lazy import inside the function body. How to avoid: Continue using the existing lazy import pattern: from app.services import hcp_profile_service inside the function, not at module level. Warning signs: ImportError on server startup.

Pitfall 6: Avatar Session Config Structure Must Match Azure API

What goes wrong: Avatar doesn't render because session config JSON structure doesn't match Azure Voice Live API expected format. Why it happens: The avatar config in session.update requires specific nested structure: { character, style, customized, video: { codec, crop, resolution } }. How to avoid: Use the exact Azure API structure from the Voice Live how-to docs. The existing useVoiceLive hook already sends avatar config but without style and customized fields -- extend it. Warning signs: WebSocket connection succeeds but avatar video stream never starts.

Code Examples

Azure Voice Live Session Config with Per-HCP Settings

{
  "instructions": "You are Dr. Zhang, an Oncology specialist...",
  "turn_detection": {
    "type": "server_vad",
    "silence_duration_ms": 500
  },
  "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"},
  "input_audio_echo_cancellation": {"type": "server_echo_cancellation"},
  "voice": {
    "name": "en-US-Ava:DragonHDLatestNeural",
    "type": "azure-standard",
    "temperature": 0.9
  },
  "input_audio_transcription": {
    "model": "azure-speech",
    "language": "zh-CN"
  },
  "avatar": {
    "character": "lori",
    "style": "casual",
    "customized": false,
    "video": {
      "codec": "h264",
      "crop": {"top_left": [560, 0], "bottom_right": [1360, 1080]}
    }
  },
  "agent_id": "dr-zhang-oncology",
  "project_name": "ai-coach-project"
}

Source: Azure Voice Live API how-to docs (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to)

Azure Standard Video Avatar Characters (for dropdown)

// Source: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars
const AVATAR_VIDEO_CHARACTERS = [
  { character: "harry",  styles: ["business", "casual", "youthful"] },
  { character: "jeff",   styles: ["business", "formal"] },
  { character: "lisa",   styles: ["casual-sitting", "graceful-sitting", "graceful-standing", "technical-sitting", "technical-standing"] },
  { character: "lori",   styles: ["casual", "graceful", "formal"] },
  { character: "max",    styles: ["business", "casual", "formal"] },
  { character: "meg",    styles: ["formal", "casual", "business"] },
] as const;

// Note: Photo avatars (Adrian, Amara, Bianca, etc.) are also available but only at 512x512 resolution.
// Video avatars are recommended for this project due to 1920x1080 resolution.

HCP Profile Voice/Avatar DB Columns

# Source: Derived from D-01 and Azure Voice Live session config
# All columns use server_default for SQLite compatibility with existing rows

# Voice settings
voice_name: Mapped[str] = mapped_column(String(200), default="en-US-AvaNeural")
voice_type: Mapped[str] = mapped_column(String(50), default="azure-standard")
voice_temperature: Mapped[float] = mapped_column(default=0.9)
voice_custom: Mapped[bool] = mapped_column(Boolean, default=False)

# Avatar settings
avatar_character: Mapped[str] = mapped_column(String(100), default="lori")
avatar_style: Mapped[str] = mapped_column(String(100), default="casual")
avatar_customized: Mapped[bool] = mapped_column(Boolean, default=False)

# Conversation parameters
turn_detection_type: Mapped[str] = mapped_column(String(50), default="server_vad")
noise_suppression: Mapped[bool] = mapped_column(Boolean, default=False)
echo_cancellation: Mapped[bool] = mapped_column(Boolean, default=False)
eou_detection: Mapped[bool] = mapped_column(Boolean, default=False)
recognition_language: Mapped[str] = mapped_column(String(20), default="auto")

# Agent instruction override (D-02)
agent_instructions_override: Mapped[str] = mapped_column(Text, default="")

Extended VoiceLiveTokenResponse Schema

# Source: Extend existing backend/app/schemas/voice_live.py
class VoiceLiveTokenResponse(BaseModel):
    # Existing fields
    endpoint: str
    token: str
    region: str
    model: str
    avatar_enabled: bool
    avatar_character: str
    voice_name: str
    agent_id: str | None = None
    project_name: str | None = None

    # New per-HCP fields (D-08)
    avatar_style: str = "casual"
    avatar_customized: bool = False
    voice_type: str = "azure-standard"
    voice_temperature: float = 0.9
    turn_detection_type: str = "server_vad"
    noise_suppression: bool = False
    echo_cancellation: bool = False
    eou_detection: bool = False
    recognition_language: str = "auto"

Turn Detection Types (for dropdown)

// Source: Azure Voice Live API how-to docs
const TURN_DETECTION_TYPES = [
  { value: "server_vad", label: "Server VAD" },
  { value: "semantic_vad", label: "Semantic VAD (gpt-realtime only)" },
  { value: "azure_semantic_vad", label: "Azure Semantic VAD (all models)" },
  { value: "azure_semantic_vad_multilingual", label: "Azure Semantic VAD Multilingual" },
] as const;

Voice Name Options (common Azure TTS voices)

// Source: Azure Speech TTS voice list (commonly used for Chinese + English)
const VOICE_NAME_OPTIONS = [
  // English voices
  { value: "en-US-AvaNeural", label: "Ava (EN-US)" },
  { value: "en-US-Ava:DragonHDLatestNeural", label: "Ava HD (EN-US)" },
  { value: "en-US-AndrewNeural", label: "Andrew (EN-US)" },
  { value: "en-US-JennyNeural", label: "Jenny (EN-US)" },
  // Chinese voices
  { value: "zh-CN-XiaoxiaoMultilingualNeural", label: "Xiaoxiao Multilingual (ZH-CN)" },
  { value: "zh-CN-XiaoxiaoNeural", label: "Xiaoxiao (ZH-CN)" },
  { value: "zh-CN-YunxiNeural", label: "Yunxi (ZH-CN)" },
  { value: "zh-CN-YunjianNeural", label: "Yunjian (ZH-CN)" },
] as const;

State of the Art

Old Approach Current Approach When Changed Impact
Global voice/avatar config Per-HCP voice/avatar config Phase 12 Each HCP is a complete digital persona
7-mode manual selector Auto-mode with fallback chain Phase 12 MRs never see mode picker
server_vad only Multiple turn detection types Voice Live API 2025-10 azure_semantic_vad works with all models
Single avatar character globally Per-HCP avatar character + style Phase 12 Different HCPs look different
h264 only codec h264 remains default (Video Avatar) Current Photo Avatar supports vp9 but lower res

Azure Voice Live API supported models (current):

  • gpt-realtime, gpt-realtime-mini, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, phi4-mm-realtime, phi4-mini

Turn detection types available:

  • server_vad (default, all models)
  • semantic_vad (gpt-realtime/gpt-realtime-mini only)
  • azure_semantic_vad (all models, Voice Live specific)
  • azure_semantic_vad_multilingual (all models, multilingual support)

Open Questions

  1. Avatar style naming format

    • What we know: Azure API uses separate character and style fields (e.g., character="lisa", style="casual-sitting"). The existing codebase stores avatar_character as a combined string like "Lisa-casual-sitting".
    • What's unclear: Should we store combined (backward compatible) or split (matches API)?
    • Recommendation: Store split (avatar_character + avatar_style) to match Azure API structure. Combine for display only. The migration can default avatar_character="lori" and avatar_style="casual".
  2. Recognition language "Auto Detect" value

    • What we know: Azure Voice Live docs show "language": "en" for explicit language. D-04 says default "Auto Detect".
    • What's unclear: The exact value for auto-detect in the Azure API (empty string? omit the field?).
    • Recommendation: Use empty string "" or omit language field from input_audio_transcription config when "auto" is selected. Store "auto" in DB, translate to API format at WebSocket config time.
  3. Whether to keep ModeSelector component

    • What we know: D-10 says MR does NOT see a mode picker. But the admin/debug use case was deferred.
    • What's unclear: Should mode-selector.tsx be deleted or just hidden from MR view?
    • Recommendation: Keep the file but do not render it in the voice session. The auto-mode logic replaces its function. The component can be restored later if developer mode is implemented.

Project Constraints (from CLAUDE.md)

Coding Standards

  • Async everywhere: all backend functions must be async def
  • Pydantic v2 schemas with model_config = ConfigDict(from_attributes=True)
  • Route ordering: static paths before parameterized (/{id})
  • Service layer holds business logic, routers only handle HTTP
  • No raw SQL -- use SQLAlchemy ORM
  • TypeScript strict mode: no any, no unused variables
  • TanStack Query hooks per domain, no inline useQuery
  • cn() for conditional class composition
  • i18n: all UI text externalized via react-i18next
  • Conventional commits: feat:, fix:, docs:, test:

Database Rules

  • NEVER modify schema without Alembic migration
  • All models use TimestampMixin
  • batch_alter_table with server_default for SQLite compatibility
  • Current Alembic head: b820e86271f8

Pre-Commit Checklist

  • Backend: ruff check ., ruff format --check ., pytest -v
  • Frontend: npx tsc -b, npm run build

Sources

Primary (HIGH confidence)

  • Azure Voice Live API how-to: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to -- session.update config structure, turn detection types, voice config, avatar config, noise suppression, echo cancellation
  • Azure Standard Avatars: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars -- full character list with styles (Harry, Jeff, Lisa, Lori, Max, Meg + photo avatars)
  • Azure Voice Live overview: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live -- supported models, pricing tiers, feature list
  • Existing codebase files (all read directly):
    • backend/app/models/hcp_profile.py -- current ORM model
    • backend/app/schemas/hcp_profile.py -- current Pydantic schemas
    • backend/app/services/voice_live_service.py -- current token broker
    • backend/app/schemas/voice_live.py -- current token response schema
    • backend/app/services/hcp_profile_service.py -- CRUD with agent sync hooks
    • backend/app/services/agent_sync_service.py -- agent instructions builder
    • frontend/src/hooks/use-voice-live.ts -- WebSocket session config builder
    • frontend/src/hooks/use-avatar-stream.ts -- WebRTC avatar connection
    • frontend/src/components/voice/voice-session.tsx -- session container with fallback
    • frontend/src/components/voice/mode-selector.tsx -- 7-mode selector (to be replaced)
    • frontend/src/pages/admin/hcp-profile-editor.tsx -- current editor layout
    • frontend/src/components/admin/hcp-table.tsx -- current table columns
    • frontend/src/types/hcp.ts -- HCP TypeScript types
    • frontend/src/types/voice-live.ts -- Voice Live types
    • frontend/src/components/ui/tabs.tsx -- Radix Tabs available in UI library
    • backend/app/services/region_capabilities.py -- region/service availability maps

Secondary (MEDIUM confidence)

  • Azure OpenAI Realtime API reference (linked from Voice Live docs) -- base event format that Voice Live extends

Tertiary (LOW confidence)

  • Voice name list is a commonly-used subset, not exhaustive. Azure has 600+ standard voices. The admin should have a text input with the dropdown as suggestions, not a locked select.

Metadata

Confidence breakdown:

  • Standard stack: HIGH - all libraries already in the project, no new dependencies
  • Architecture: HIGH - extending well-established patterns (token broker, HCP CRUD, form hooks)
  • Pitfalls: HIGH - based on direct codebase reading and established project conventions
  • Azure API config structure: HIGH - verified from official Microsoft documentation (updated 2026-02-04 / 2026-03-16)

Research date: 2026-04-02 Valid until: 2026-05-02 (stable -- Azure Voice Live API is GA, avatar characters list stable)

UI Specification

Click to expand UI spec

Phase 12 -- UI Design Contract

Visual and interaction contract for the Voice Realtime API & Agent Mode Integration phase. Generated by gsd-ui-researcher, verified by gsd-ui-checker.


Design System

Property Value
Tool none (Tailwind CSS v4 with @theme inline custom properties)
Preset not applicable
Component library Radix UI (via project @/components/ui/* wrappers)
Icon library lucide-react >=0.460.0
Font Inter + Noto Sans SC (sans-serif), JetBrains Mono (monospace)

Source: Existing frontend/src/styles/index.css @theme inline block, established in Phase 01. No new design system installations required.


Spacing Scale

Declared values (must be multiples of 4):

Token Value Usage in Phase 12
xs 4px Icon gaps, inline badge padding within Voice+Avatar column (gap-1), switch-to-label gap
sm 8px Compact element spacing, tab trigger padding, form field gaps within a row, dot-to-text gap in ModeStatusIndicator (gap-2)
md 16px Default element spacing, card content padding, tab content top margin, form field vertical gaps (space-y-4)
lg 24px Section padding within cards, gap between form sections inside a tab (space-y-6)
xl 32px Gap between major card sections in the editor, header-to-content gap
2xl 48px Page-level top/bottom padding
3xl 64px Not used in this phase

Exceptions: Touch target minimum 44px for voice session controls (mic button, end session button) per existing Phase 08 pattern.


Typography

Role Size Weight Line Height Phase 12 Usage
Badge/Indicator 12px (text-xs) 400 or 600 1.5 Badge text in HCP table Voice+Avatar column, ModeStatusIndicator text (font-semibold), agent sync status badges
Body 14px (text-sm) 400 (normal) 1.5 Form field values, table cell text, Textarea content, transcript text
Label 14px (text-sm) 400 (normal) 1.5 FormLabel text, Switch labels, Select labels. Differentiated from body via text-muted-foreground color, not weight
Heading 16px (text-base) 600 (semibold) 1.5 CardTitle in each form section (Voice Settings, Avatar Settings, etc.), tab triggers
Display 24px (text-2xl) 600 (semibold) 1.5 Not used in this phase (no page-level display headings introduced)

Two weights only: 400 (normal) for body text and labels, 600 (semibold) for headings.


Color

Role Value Usage in Phase 12
Dominant (60%) var(--background) #FFFFFF Page background, tab content background, form input backgrounds
Secondary (30%) var(--card) #FFFFFF / var(--muted) #ececf0 Cards in HCP editor, table header row bg-slate-50/50, tab list background bg-muted, disabled Textarea bg-muted/50
Accent (10%) var(--primary) #1E40AF Save Profile primary button (bg-primary), active tab trigger shadow highlight
Destructive var(--destructive) #EF4444 Delete HCP action, End Session button, failed agent sync status badge (bg-red-100 text-red-700), disconnected mode status dot

Accent reserved for:

  • Save Profile primary button (bg-primary)
  • Active TabsTrigger state (uses bg-background with shadow per Radix default, not direct accent fill)

Additional semantic colors used in this phase (already established):

Token Value Usage
Green bg-green-500 (dot) / bg-green-100 text-green-700 (badge) Connected mode status dot, synced agent badge, avatar active indicator
Amber bg-amber-500 (dot) / bg-amber-100 text-amber-700 (badge) Degraded mode status dot, pending agent sync badge
Red bg-destructive (dot) / bg-red-100 text-red-700 (badge) Disconnected mode status dot, failed agent sync badge
Muted foreground var(--muted-foreground) #717182 "Not configured" badge text, disabled form labels, placeholder text

Source: Existing CSS custom properties in index.css, Phase 10 theme system. ModeStatusIndicator dot colors verified from implementation: bg-green-500, bg-amber-500, bg-destructive.


Focal Points

Screen Primary Focal Point Rationale
HCP Profile Editor Save Profile button (top-right of header bar) The single CTA that commits all tab changes; placed in persistent header outside tabs so it remains visible regardless of active tab
Voice Session ModeStatusIndicator (center of VoiceSessionHeader) Communicates the live connection state and active mode; center placement ensures MR always knows session health at a glance
HCP Table Voice & Avatar column badges New column added in this phase; draws attention to per-HCP digital persona configuration status

Component Inventory

New Components

Component Location Description
VoiceAvatarTab frontend/src/components/admin/voice-avatar-tab.tsx Form tab content for Voice & Avatar settings. Contains: voice name Select/Input with custom voice Switch toggle, avatar character Select with avatar style Select (linked -- style options filter by selected character using AVATAR_VIDEO_CHARACTERS constant), custom avatar Switch toggle, conversation parameters (temperature Slider 0.0-1.0 step 0.1, turn detection Select with 4 options, boolean Switches for noise suppression / echo cancellation / EOU detection, recognition language Select). Uses UseFormReturn<HcpFormValues> from parent form instance.
AgentTab frontend/src/components/admin/agent-tab.tsx Form tab content for Agent instructions and sync status. Contains: agent status Card (icon + status label + agent_id with Tooltip + Retry Sync Button + View in Azure Portal link), auto-generated instructions preview via buildPreviewInstructions() (disabled Textarea), editable override Textarea (agent_instructions_override form field). Uses AGENT_STATUS_CONFIG constant for status icon/color/bg mapping.
ModeStatusIndicator frontend/src/components/voice/mode-status-indicator.tsx Persistent session mode badge replacing the center Badge in VoiceSessionHeader. Shows current active mode label (from voice:modeBadge.* i18n keys) with colored dot: green (bg-green-500) when at optimal mode (currentMode === initialMode), amber (bg-amber-500) when degraded (currentMode !== initialMode), red (bg-destructive) when disconnected/error. Uses Badge variant="outline" with role="status" aria-live="polite". Dot is size-2 shrink-0 rounded-full. Gap between dot and text: gap-2 (8px).

Modified Components

Component Changes
hcp-profile-editor.tsx Replaced single-page form layout with 3-tab Tabs layout. Form wraps Tabs (not individual TabsContent) for cross-tab state persistence via single useForm<HcpFormValues> instance. Profile tab wraps existing Identity/Personality/Knowledge/Interaction Cards. Voice & Avatar tab renders VoiceAvatarTab. Agent tab renders AgentTab. Zod schema extended with 13 voice/avatar fields. Header with Save button remains outside tabs.
hcp-table.tsx Added "Voice & Avatar" column after "Agent Status" column. Renders voice name (via getVoiceLabel() helper) and avatar character-style as inline Badge pair (variant="outline", text-xs), or text-muted-foreground "Not configured" text when defaults. Column is non-sortable.
voice-session.tsx Removed mode prop from external interface. Added hcpProfileId prop. Auto-resolves mode from token broker response via resolveMode(tokenData) function (D-10). Implements fallback chain (D-11): avatar fail -> voice-only -> text with toast.warning() notifications. Passes hcpProfileId to useVoiceToken hook. Passes currentMode, initialMode, connectionState to ModeStatusIndicator.
voice-session-header.tsx Replaced center static Badge with ModeStatusIndicator component. Passes currentMode, initialMode, and connectionState as props.
mode-selector.tsx File retained but component no longer rendered in voice session pages (D-10). Not deleted to allow future developer-mode restoration.

Reused Components (no changes)

Component Usage in Phase 12
Tabs / TabsList / TabsTrigger / TabsContent HCP editor 3-tab layout
Select / SelectTrigger / SelectContent / SelectItem Voice name, avatar character, avatar style, turn detection, recognition language dropdowns
Switch Custom voice toggle, custom avatar toggle, noise suppression, echo cancellation, EOU detection
Slider Temperature (0.0 - 1.0, step 0.1)
Input Custom voice name text input (shown when custom voice toggle is on)
Badge Voice+Avatar column in HCP table, mode status in session header
Card / CardHeader / CardTitle / CardContent Form section containers within each tab
Tooltip / TooltipTrigger / TooltipContent Agent ID display, agent status error details
Dialog End session confirmation (existing)
toast (sonner) Fallback notifications (D-12), save success/error, sync success/error
Form / FormField / FormItem / FormLabel / FormControl / FormMessage All form fields in all three tabs
Textarea Auto-generated instructions (disabled), override instructions (editable)
Button Save Profile, Retry Sync, View in Azure Portal, back navigation
Label Switch companion labels using htmlFor binding

Interaction Contracts

I-01: HCP Editor Tab Navigation (D-05)

Trigger: Admin clicks a tab trigger (Profile / Voice & Avatar / Agent). Behavior: Radix Tabs switches visible content instantly. All three TabsContent panels remain mounted in DOM (Radix default behavior). Form state from react-hook-form persists across tab switches because a single useForm<HcpFormValues> instance wraps all tabs at the <Form> level above <Tabs>. Visual: Active tab trigger shows bg-background with shadow (default TabsTrigger style from data-[state=active]). Inactive triggers show text-muted-foreground. Constraint: Tab switching must NOT trigger form validation. Validation only runs on Save button click via form.handleSubmit().

I-02: Voice Name Selection (D-01, D-04)

Trigger: Admin interacts with voice name field in Voice & Avatar tab. Behavior: When "Custom voice" Switch is OFF (default), show a Select dropdown with preset voice options from VOICE_NAME_OPTIONS constant (8 options: 4 English, 4 Chinese). When toggled ON, show a text Input for free-form voice name entry. Default value: "en-US-AvaNeural" per D-04. Visual: Custom voice Switch at top of Voice Settings card with flex items-center justify-between layout. Select dropdown below, or Input when custom mode enabled, with placeholder "e.g., en-US-Ava:DragonHDLatestNeural".

I-03: Avatar Character + Style Selection (D-03)

Trigger: Admin selects avatar character in Voice & Avatar tab. Behavior: Two linked Select dropdowns. Character dropdown shows 6 video avatar characters from AVATAR_VIDEO_CHARACTERS constant (harry, jeff, lisa, lori, max, meg). When character changes, style dropdown filters to show only valid styles for that character via useMemo. When "Custom avatar" Switch is ON, character becomes a text Input. Default: character "lori", style "casual" per D-04. Visual: Side-by-side Select dropdowns using grid grid-cols-2 gap-4. Character label left column, style label right column. Custom avatar Switch below with same flex items-center justify-between layout as custom voice.

I-04: Conversation Parameters (D-01)

Trigger: Admin adjusts conversation parameters in Voice & Avatar tab. Behavior: Temperature uses Slider (min 0.0, max 1.0, step 0.1, default 0.9). Turn detection uses Select with 4 options from TURN_DETECTION_TYPES constant (server_vad default). Noise suppression, echo cancellation, EOU detection each use Switch (all default OFF per D-04). Recognition language uses Select with options from RECOGNITION_LANGUAGES constant including "Auto Detect" (default "auto"). Visual: Stacked form fields within a Conversation Parameters Card. Fields use space-y-4. Switch rows use flex items-center justify-between. Temperature shows current numeric value to the right of the Slider.

I-05: Agent Instructions Override (D-02)

Trigger: Admin views Agent tab. Behavior: Auto-generated instructions text (built by buildPreviewInstructions() from current form values including name, specialty, personality, objections, expertise) displayed in a disabled Textarea with muted background. Below it, an editable Textarea (agent_instructions_override form field) allows admin to write custom instructions. If override is non-empty, it takes priority when syncing to AI Foundry (checked in backend build_agent_instructions). Visual: Two Textareas stacked vertically within an Agent Instructions Card. Top one: disabled with bg-muted/50 appearance, rows=6. Bottom one: standard input style, rows=6, placeholder text from admin:hcp.overridePlaceholder.

I-06: HCP Table Voice+Avatar Column (D-06, D-07)

Trigger: Table renders with HCP profile data. Behavior: New column after "Agent Status". Shows voice name (shortened via getVoiceLabel() helper -- e.g., "en-US-AvaNeural" becomes "Ava") and avatar character+style combined as two inline Badge elements. If HCP has no voice_name or avatar_character, shows "Not configured" text. Visual: Two Badge variant="outline" side by side with gap-1 (4px). Both badges use text-xs. Voice badge shows short name (e.g., "Ava"). Avatar badge shows combined character-style (e.g., "Lori-casual"). When not configured: plain text-xs text-muted-foreground text.

I-07: Auto-Mode Resolution (D-10)

Trigger: Voice session starts, token broker response received via useVoiceToken hook. Behavior: resolveMode(tokenData) determines the best available mode:

  1. If tokenData.avatar_enabled && tokenData.agent_id -> "digital_human_realtime_agent"
  2. If tokenData.avatar_enabled -> "digital_human_realtime_model"
  3. If tokenData.agent_id -> "voice_realtime_agent"
  4. Otherwise -> "voice_realtime_model" MR never sees a mode picker. Mode is auto-selected. initialMode captured via useRef for degradation detection. Visual: No ModeSelector rendered in voice session. ModeStatusIndicator in header shows resolved mode label.

I-08: Fallback Chain (D-11, D-12)

Trigger: Avatar connection fails during session, or voice connection degrades. Behavior: Three-level fallback: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Each fallback triggers:

  1. toast.warning() notification via sonner with descriptive text from voice:error.avatarFallback or voice:error.voiceFallback
  2. currentMode state updates to new degraded mode
  3. ModeStatusIndicator updates automatically (dot turns amber when currentMode !== initialMode) Visual: Toast uses sonner warning styling (amber tint). ModeStatusIndicator badge text updates dynamically to reflect new mode label from voice:modeBadge.*.

I-09: Mode Status Indicator (D-12)

Trigger: Always visible during voice session in VoiceSessionHeader center. Behavior: Renders as <Badge variant="outline"> with prepended colored dot. Shows:

  • Mode label text from voice:modeBadge.{currentMode} i18n key (e.g., "Digital Human Agent", "Voice Agent", "Voice Realtime")
  • Status text from voice:modeStatus.* (Connected / Degraded / Disconnected)
  • Format: "{mode label} - {status text}"
  • Dot color logic: isDisconnected -> bg-destructive, isDegraded (currentMode !== initialMode) -> bg-amber-500, else -> bg-green-500 Visual: Badge with flex items-center gap-2 text-xs font-semibold. Dot is size-2 shrink-0 rounded-full. ARIA: role="status" aria-live="polite" for screen reader announcements.

I-10: Token Broker Per-HCP Wiring (D-08)

Trigger: MR starts a voice session for a scenario with an HCP. Behavior: Frontend passes hcpProfileId to useVoiceToken hook, which passes it to POST /api/v1/voice-live/token endpoint. Backend reads HCP profile voice/avatar settings and returns them in VoiceLiveTokenResponse (voice_name, avatar_character, avatar_style, avatar_customized, voice_temperature, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language). Falls back to global defaults when no HCP profile or on exception. useVoiceLive and useAvatarStream hooks consume per-HCP settings from token response. Visual: No visible UI change from user perspective. Settings are applied transparently -- MR sees the correct avatar character and hears the correct voice for each HCP.

I-12: Avatar Display in Voice Session (NEW)

Trigger: Voice session starts and tokenData.avatar_enabled === true. Behavior: Voice session page renders the unified avatar+chat layout (L-06). Avatar Display Area shows:

  1. Azure AI Avatar video stream<video> element connected to avatar WebSocket stream. Avatar lip-syncs with agent TTS output. Video auto-plays, muted (audio comes from TTS stream separately).
  2. Static image fallback — If avatar video stream fails but avatar_character is configured, show a static avatar image (<img> from avatar character asset URL). ModeStatusIndicator shows amber "Degraded" state.
  3. No avatar — If avatar_enabled === false or avatar_character not configured, render the standard voice-only layout without avatar area.

Chat panel on the right shows the real-time conversation transcript alongside the avatar. Both update simultaneously — user sees avatar speaking while reading the text.

Visual: Avatar centered in its container with neutral background. Smooth fade-in transition (transition-opacity duration-300) when avatar stream connects. Loading state shows skeleton pulse animation in the avatar area. Chat bubbles: AI messages use bg-card with left alignment, user messages use bg-primary/10 with right alignment.

Constraint: Avatar video must maintain aspect ratio (never stretch/distort). Use object-contain to fit within container bounds.

I-11: End Session with Flush (existing, unchanged)

Trigger: MR clicks End Session button. Behavior: Dialog confirmation. On confirm: flush pending transcripts via pendingFlushesRef with Promise.all, disconnect voice/avatar, call endSession API, navigate to scoring page. Visual: Existing Dialog pattern from Phase 08. No changes in Phase 12.


Copywriting Contract

All copy externalized via react-i18next. English (en-US) and Chinese (zh-CN) values verified against actual implementation.

Admin Namespace (admin:hcp.*)

Element i18n Key en-US Copy zh-CN Copy
Tab: Profile admin:hcp.tabProfile Profile 基本信息
Tab: Voice & Avatar admin:hcp.tabVoiceAvatar Voice & Avatar 语音和数字人
Tab: Agent admin:hcp.tabAgent Agent AI 代理
Voice section title admin:hcp.voiceSettings Voice Settings 语音设置
Avatar section title admin:hcp.avatarSettings Avatar Settings 数字人设置
Conversation params title admin:hcp.conversationParams Conversation Parameters 对话参数
Custom voice toggle admin:hcp.customVoice Custom voice 自定义语音
Custom avatar toggle admin:hcp.customAvatar Custom avatar 自定义数字人
Voice name label admin:hcp.voiceName Voice Name 语音名称
Avatar character label admin:hcp.avatarCharacter Avatar Character 数字人角色
Avatar style label admin:hcp.avatarStyle Avatar Style 数字人风格
Temperature label admin:hcp.temperature Temperature 对话温度
Turn detection label admin:hcp.turnDetection Turn Detection 轮次检测
Noise suppression label admin:hcp.noiseSuppression Noise Suppression 噪声抑制
Echo cancellation label admin:hcp.echoCancellation Echo Cancellation 回声消除
EOU detection label admin:hcp.eouDetection End-of-Utterance Detection 语音终止检测
Recognition language label admin:hcp.recognitionLanguage Recognition Language 识别语言
Auto detect option admin:hcp.autoDetect Auto Detect 自动检测
Agent instructions (auto) admin:hcp.autoInstructions Auto-generated Instructions 自动生成指令
Agent instructions (override) admin:hcp.overrideInstructions Override Instructions 自定义指令
Override placeholder admin:hcp.overridePlaceholder Leave empty to use auto-generated instructions 留空则使用自动生成的指令
Table column header admin:hcp.voiceAvatarCol Voice & Avatar 语音和数字人
Not configured text admin:hcp.notConfigured Not configured 未配置
Primary CTA admin:hcp.save Save Profile 保存配置
Empty state heading admin:hcp.emptyTitle No HCP Profiles 暂无 HCP 配置
Empty state body admin:hcp.emptyBody Create your first HCP profile to start building training scenarios. 创建第一个 HCP 配置以开始培训。
Delete with agent admin:hcp.deleteConfirmWithAgent Delete HCP Profile: This will permanently remove this profile, delete its AI Foundry agent, and unassign it from all scenarios. This action cannot be undone. 删除 HCP 配置:将永久删除此配置、其 AI Foundry Agent 以及所有关联场景分配。此操作不可撤销。
Delete with agent (short) admin:hcp.deleteConfirmAgent Delete this HCP profile? This will also delete the linked AI Foundry Agent. 确定删除此 HCP 配置?关联的 AI Foundry Agent 也将被删除。
Error: save failed admin:errors.hcpSaveFailed Failed to save HCP profile. Please try again. 保存 HCP 配置失败,请重试。

Voice Namespace (voice:*)

Element i18n Key en-US Copy zh-CN Copy
Fallback toast: avatar voice:error.avatarFallback Avatar unavailable, switching to voice mode 数字人不可用,已切换为语音模式
Fallback toast: voice voice:error.voiceFallback Voice unavailable, switching to text mode 语音不可用,已切换为文字模式
Mode: connected voice:modeStatus.connected Connected 已连接
Mode: degraded voice:modeStatus.degraded Degraded 降级模式
Mode: disconnected voice:modeStatus.disconnected Disconnected 已断开
Mode badge: text voice:modeBadge.text Text Mode 文字模式
Mode badge: voice pipeline voice:modeBadge.voice_pipeline Voice Pipeline 语音管线
Mode badge: DH pipeline voice:modeBadge.digital_human_pipeline Digital Human Pipeline 数字人管线
Mode badge: voice RT model voice:modeBadge.voice_realtime_model Voice Realtime 语音实时
Mode badge: DH RT model voice:modeBadge.digital_human_realtime_model Digital Human Realtime 数字人实时
Mode badge: voice RT agent voice:modeBadge.voice_realtime_agent Voice Agent 语音代理
Mode badge: DH RT agent voice:modeBadge.digital_human_realtime_agent Digital Human Agent 数字人代理
Avatar loading voice:avatar.loading Connecting to avatar... 正在连接数字人...
Avatar failed voice:avatar.failed Avatar unavailable 数字人不可用
Transcript label voice:transcript Transcript 对话记录
Chat input placeholder voice:chatPlaceholder Type a message or use the mic... 输入消息或使用麦克风...

Layout Contracts

L-01: HCP Profile Editor (Tabbed, D-05)

Focal point: Save Profile button (top-right of header bar). Remains visible and accessible regardless of active tab.

+-----------------------------------------------------------+
| [<-] Create/Edit HCP Profile     [Test Chat] [Save]       |  <- Header bar (fixed, outside tabs)
+-----------------------------------------------------------+
| [Profile] [Voice & Avatar] [Agent]                         |  <- TabsList (h-9, bg-muted, rounded-lg)
+-----------------------------------------------------------+
|                                                            |
|  Tab content area (scrollable, max-w-4xl mx-auto)          |
|  Cards stacked vertically with space-y-6 (24px gap)        |
|                                                            |
+-----------------------------------------------------------+

Single <Form> wraps <Tabs>. Previous 3-column grid layout (2-col form + 1-col sidebar) replaced by full-width tabs. Agent status card and timestamps card moved into Agent tab content.

L-02: Voice & Avatar Tab Content

+-----------------------------------------------------------+
| Card: Voice Settings                                       |
|   Custom voice: [OFF ----]                     [Switch]    |
|   Voice Name: [Select dropdown  \/]                        |
|   (or Input if custom voice ON)                            |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Avatar Settings                                      |
|   Custom avatar: [OFF ----]                    [Switch]    |
|   Character: [Select \/]    Style: [Select \/]             |
|              (grid grid-cols-2 gap-4)                      |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Conversation Parameters                              |
|   Temperature: [=====O=====] 0.9      (Slider + value)    |
|   Turn Detection: [Select \/]                              |
|   Noise Suppression: [label]           [Switch]            |
|   Echo Cancellation: [label]           [Switch]            |
|   EOU Detection: [label]              [Switch]             |
|   Recognition Language: [Select \/]                        |
+-----------------------------------------------------------+

Each section is a Card. Fields within cards use space-y-4. Switch rows use flex items-center justify-between. Character and style dropdowns are grid grid-cols-2 gap-4.

L-03: Agent Tab Content

+-----------------------------------------------------------+
| Card: Agent Status (bg matches sync status config)         |
|   [Icon] Status: Synced / Pending / Failed / None          |
|   Agent ID: asst_xxxxx (Tooltip for full ID)               |
|   [Retry Sync button]  [View in Azure Portal link]         |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Agent Instructions                                   |
|   Auto-generated Instructions (label):                     |
|   [================================]                       |
|   [  You are Dr. Zhang, an...      ]  (disabled Textarea)  |
|   [================================]                       |
|                      space-y-4                             |
|   Override Instructions (label):                           |
|   [================================]                       |
|   [  (editable)                    ]  (active Textarea)    |
|   [================================]                       |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Metadata                                             |
|   Created: 2026-04-01 10:00                                |
|   Last Updated: 2026-04-02 14:30                           |
+-----------------------------------------------------------+

Agent status Card uses AGENT_STATUS_CONFIG for dynamic bg + border + icon + color per status value.

L-04: Voice Session Header with Mode Status (D-12)

Focal point: ModeStatusIndicator (center of header). Communicates live connection state.

+---[ Timer | Scenario Title ]---[ ModeStatusIndicator ]---[ ConnectionStatus | View | End ]---+
|  h-16 (64px)                                                                                  |

ModeStatusIndicator replaces the previous static Badge in center position. Format: [dot] {mode label} - {status}. Width auto-fits content. Dot size-2, text text-xs font-semibold, gap gap-2.

L-06: Voice Session with Avatar (Unified Page Layout)

Focal point: Avatar video/image (center-left). When HCP has avatar configured, avatar and agent conversation display on the same page.

Condition: Rendered when tokenData.avatar_enabled === true AND avatar connection is active. Falls back to L-04 voice-only layout when avatar is not configured or avatar connection fails.

+---[ Header: Timer | Scenario Title | ModeStatusIndicator | End ]---+  <- h-16
+--------------------------------------------------------------------+
|              |                              |                       |
|  Scenario    |    Avatar Display Area       |   Chat / Transcript   |
|  Panel       |    (center, flex-1)          |   Panel               |
|  (w-64,      |                              |   (w-[400px],         |
|   optional,  |  +----------------------+    |    flex flex-col)     |
|   collaps-   |  |                      |    |                       |
|   ible)      |  |  [Avatar Video/Img]  |    |   +---------------+   |
|              |  |  (aspect-[3/4] or    |    |   | Chat messages |   |
|              |  |   object-contain,    |    |   | (flex-1,      |   |
|              |  |   max-h-[70vh],      |    |   |  overflow-y-  |   |
|              |  |   mx-auto)           |    |   |  auto)        |   |
|              |  |                      |    |   +---------------+   |
|              |  +----------------------+    |   | [Input] [Mic] |   |
|              |                              |   +---------------+   |
+--------------------------------------------------------------------+

Avatar Display Area:

  • Container: flex items-center justify-center bg-neutral-50 dark:bg-neutral-900 rounded-lg overflow-hidden
  • Video element (when Azure AI Avatar streaming): <video> tag with autoPlay muted playsInline, sized to max-h-[70vh] w-auto mx-auto
  • Static image fallback (when avatar image configured but no video stream): <img> with object-contain max-h-[70vh] mx-auto
  • Empty state (avatar loading): Skeleton with pulsing animation, same aspect ratio
  • Background: subtle neutral to frame the avatar cleanly

Chat Panel (right side):

  • Shows real-time transcript messages (AI responses + user utterances)
  • Messages styled as chat bubbles: AI messages left-aligned (white/card bg), user messages right-aligned (primary/muted bg)
  • Text input at bottom with mic button for push-to-talk or toggle
  • Panel header: optional "Transcript" label or hidden for clean look

Interaction: Avatar lip-syncs or animates with agent speech. Chat transcript updates simultaneously with text-to-speech output. User can type or speak — both channels active.

Constraint: Avatar area must never overlap or obscure the chat panel. On narrow viewports, chat panel overlays avatar with semi-transparent background or stacks below.

L-05: HCP Table with Voice+Avatar Column (D-06)

| Name | Specialty | Personality | Comm Style | Agent Status | Voice & Avatar | Actions |
|------|-----------|-------------|------------|--------------|----------------|---------|
| Dr.Z | Oncology  | [friendly]  | 50 (Ind.)  | [Synced]     | [Ava][Lori-c]  | E R D   |
| Dr.L | Hematol.  | [skeptical] | 30 (Dir.)  | [Failed]     | Not configured  | E R D   |

New column positioned after Agent Status, before Actions. Column width: auto (content-driven). Badge pair uses gap-1 (4px). Both badges variant="outline" with text-xs.


State Management

State Type Location Purpose
HCP form values (all tabs) useForm<HcpFormValues> (react-hook-form + zod) hcp-profile-editor.tsx Single form instance across Profile / Voice & Avatar / Agent tabs. Zod schema includes all 13 voice/avatar fields. Prevents data loss on tab switch.
Active tab Radix Tabs internal (defaultValue="profile") hcp-profile-editor.tsx Uncontrolled. No external state needed.
Current session mode useState<SessionMode> voice-session.tsx Auto-resolved from token broker via resolveMode(), updated on fallback chain trigger.
Initial session mode useRef<SessionMode> voice-session.tsx Captured at session start. Used by ModeStatusIndicator to detect degradation (initial vs current).
Token broker response TanStack Query mutation via useVoiceToken use-voice-token.ts Extended to pass hcpProfileId. Returns all per-HCP voice/avatar/conversation params in VoiceLiveToken.
Avatar style options useMemo derived from selected character voice-avatar-tab.tsx Filters AVATAR_VIDEO_CHARACTERS styles when character selection changes.

Accessibility

Requirement Implementation
Tab keyboard navigation Radix Tabs handles Arrow key navigation, Enter/Space activation automatically
Form labels All voice/avatar fields use FormLabel via react-hook-form FormField pattern. Switch labels use companion <Label htmlFor>
Switch ARIA Each Switch has adjacent label text and aria-checked state (Radix default)
Mode status announcements Badge uses role="status" and aria-live="polite" to announce mode changes to screen readers
Color not sole indicator Mode status uses text label ("Connected" / "Degraded" / "Disconnected") alongside colored dot. Agent status uses icon + text label alongside background color
Fallback toast sonner toasts include descriptive text, not just color. Warning level provides distinct styling
Tooltip for truncated content Agent ID shown in Tooltip when truncated in Agent tab

Registry Safety

Registry Blocks Used Safety Gate
shadcn official Not applicable (components already installed manually as Radix wrappers) not required
Third-party none not applicable

No new component installations needed. All required UI primitives (Tabs, Select, Switch, Slider, Badge, Card, Form, Input, Textarea, Dialog, Tooltip, Button, Label) are already present in frontend/src/components/ui/.


Responsive Behavior

Breakpoint HCP Editor Voice Session (with Avatar) Voice Session (no Avatar) HCP Table
Desktop (>=1024px) Tabs full-width, max-w-4xl mx-auto, all cards visible 3-panel: Scenario sidebar (w-64, collapsible) + Avatar center (flex-1) + Chat right (w-[400px]). Avatar video fills center with max-h-[70vh] 3-panel layout (existing ScenarioPanel + center + HintsPanel) All 7 columns visible
Tablet (768-1023px) Same as desktop, narrower content area 2-panel: Avatar top (50vh) + Chat bottom (50vh). Scenario sidebar hidden (accessible via hamburger). Avatar scales down proportionally 3-panel stacks vertically (existing lg:flex-row pattern) Hide Comm Style column; Voice & Avatar badges stack vertically
Mobile (<768px) Full-width tabs, cards stack, grid-cols-2 for avatar character/style collapses to grid-cols-1 Chat panel overlays avatar with semi-transparent bg + drag handle to resize. Or tab toggle: [Avatar] [Chat] tabs at bottom. Mic button always visible as floating action Single panel with collapsible side panels (existing pattern) Horizontal scroll on table, or hide Voice & Avatar and Comm Style columns

Checker Sign-Off

  • Dimension 1 Copywriting: PASS
  • Dimension 2 Visuals: PASS
  • Dimension 3 Color: PASS
  • Dimension 4 Typography: PASS (FLAG resolved — added text-xs to table)
  • Dimension 5 Spacing: PASS
  • Dimension 6 Registry Safety: PASS

Approval: APPROVED (2026-04-02)

Verification

Click to expand verification report

Phase 12: Voice Realtime API & Agent Mode Integration Verification Report

Phase Goal: Each HCP profile becomes a complete "digital persona" with per-HCP voice, avatar, and conversation parameters. The token broker returns all settings in one response. MRs get automatic mode selection (Digital Human Realtime Agent as default) with graceful fallback to voice-only or text. Admin configures HCP digital personas via a tabbed editor.

Verified: 2026-04-02T14:15:00Z Status: passed Re-verification: Yes -- after gap closure (commit 8126313 fixed voice-session.test.tsx)

Goal Achievement

Observable Truths

# Truth Status Evidence
1 Admin can configure per-HCP voice settings, avatar settings, and conversation parameters via tabbed HCP editor VERIFIED voice-avatar-tab.tsx (438 lines): 3 Cards (Voice Settings, Avatar Settings, Conversation Parameters) with Select dropdowns for voice name (8 options), avatar character (6 options) with dynamic style filtering, temperature Slider, 3 Switch controls (noise suppression, echo cancellation, EOU detection), turn detection Select, recognition language Select. hcp-profile-editor.tsx imports and renders VoiceAvatarTab in TabsContent.
2 Token broker returns all per-HCP voice/avatar settings when hcp_profile_id is provided, falls back to global defaults when not VERIFIED voice_live_service.py lines 82-106: sources all 13 fields from profile.voice_name, profile.avatar_character, etc. when hcp_profile_id provided. Lines 65-79: initializes defaults before the if-block. Lines 108-130: returns all fields in VoiceLiveTokenResponse.
3 New HCPs get smart defaults (voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD) without manual configuration VERIFIED hcp_profile.py model defaults: voice_name="en-US-AvaNeural", avatar_character="lori", avatar_style="casual", voice_temperature=0.9, turn_detection_type="server_vad". Migration i12b has matching server_default on all 13 columns.
4 MR does NOT see a mode picker -- system auto-selects best mode based on HCP config and service availability VERIFIED voice-session.tsx: resolveMode(tokenData) function at line 49 derives mode from avatar_enabled and agent_id. No ModeSelector import or render found. Props interface uses hcpProfileId: string, not mode: SessionMode.
5 Fallback chain works: Digital Human Realtime Agent -> Voice-only Realtime -> Text, with toast notification and persistent mode status indicator VERIFIED voice-session.tsx: avatar connect failure triggers toast.warning(t("error.avatarFallback")) (line 193) and falls back to voice-only. Voice connection failure triggers toast.warning(t("error.voiceFallback")) (lines 142, 210) and falls back to text. mode-status-indicator.tsx: green/amber/red dot with role="status" and aria-live="polite".
6 HCP table shows Voice & Avatar column with badge pair showing per-HCP configuration VERIFIED hcp-table.tsx: column header t("hcp.voiceAvatarCol") at line 181. Cell renders two Badge elements with getVoiceLabel(profile.voice_name) and profile.avatar_character-profile.avatar_style.
7 Agent instructions support admin override via Agent tab (D-02) VERIFIED agent-tab.tsx: disabled Textarea showing buildPreviewInstructions() auto-generated preview, editable Textarea for agent_instructions_override with i18n placeholder. Backend agent_sync_service.py: checks override first, returns trimmed text if non-empty. 5 dedicated override tests pass.
8 All new UI text externalized to i18n in both en-US and zh-CN VERIFIED admin.json (en-US): 21+ keys including tabProfile, tabVoiceAvatar, tabAgent, voiceSettings, avatarSettings, voiceAvatarCol, notConfigured. admin.json (zh-CN): matching keys with Chinese translations. voice.json (en-US): modeStatus.connected/degraded/disconnected, error.avatarFallback/voiceFallback. voice.json (zh-CN): matching Chinese translations.

Score: 8/8 truths verified

Required Artifacts

Artifact Expected Status Details
backend/alembic/versions/i12b_add_voice_avatar_fields_to_hcp_profile.py Migration adding 13 columns VERIFIED 13 add_column calls with server_default on all, batch_alter_table for SQLite compat
backend/app/models/hcp_profile.py ORM model with voice/avatar columns VERIFIED 13 new Mapped columns (voice_name, voice_type, voice_temperature, voice_custom, avatar_character, avatar_style, avatar_customized, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language, agent_instructions_override)
backend/app/schemas/hcp_profile.py Extended Pydantic schemas VERIFIED HcpProfileCreate, HcpProfileUpdate, HcpProfileResponse all include 13 voice/avatar fields
backend/app/schemas/voice_live.py VoiceLiveTokenResponse with per-HCP fields VERIFIED 11 per-HCP fields added
backend/app/services/voice_live_service.py Token broker with per-HCP sourcing VERIFIED Sources all fields from profile when hcp_profile_id provided, falls back to defaults
backend/app/api/voice_live.py Endpoint with hcp_profile_id query param VERIFIED `hcp_profile_id: str
backend/app/services/agent_sync_service.py Agent instructions override (D-02) VERIFIED build_agent_instructions checks override first, returns trimmed text if non-empty
backend/app/api/hcp_profiles.py HcpProfileOut with voice/avatar fields VERIFIED 13 voice/avatar fields added to HcpProfileOut response model
frontend/src/types/hcp.ts Extended TypeScript types VERIFIED HcpProfile has 13 voice/avatar fields, HcpProfileCreate has all optional
frontend/src/types/voice-live.ts VoiceLiveToken with per-HCP fields VERIFIED 11 per-HCP optional fields added
frontend/src/api/voice-live.ts API client with hcpProfileId VERIFIED fetchVoiceLiveToken(hcpProfileId?: string) passes as query param
frontend/src/hooks/use-voice-token.ts Mutation accepts hcpProfileId VERIFIED `useMutation<VoiceLiveToken, Error, string
frontend/src/components/admin/voice-avatar-tab.tsx Voice & Avatar tab component VERIFIED 438 lines, 3 Cards, all form fields wired to react-hook-form
frontend/src/components/admin/agent-tab.tsx Agent tab component VERIFIED 281 lines, AGENT_STATUS_CONFIG, preview + override textareas, metadata card
frontend/src/pages/admin/hcp-profile-editor.tsx Tabbed HCP editor VERIFIED 3 TabsTrigger values (profile, voice-avatar, agent), imports VoiceAvatarTab + AgentTab
frontend/src/components/admin/hcp-table.tsx HCP table with Voice+Avatar column VERIFIED voiceAvatarCol header, Badge pair display
frontend/src/components/voice/mode-status-indicator.tsx Mode status badge VERIFIED Green/amber/red dot, i18n labels, role="status", aria-live="polite"
frontend/src/components/voice/voice-session.tsx Auto-mode + fallback chain VERIFIED resolveMode function, hcpProfileId prop (no mode prop), fallback with toast warnings
frontend/src/components/voice/voice-session-header.tsx Header with ModeStatusIndicator VERIFIED currentMode/initialMode props, ModeStatusIndicator rendered
frontend/src/hooks/use-voice-live.ts Per-HCP session config VERIFIED Uses tokenData.voice_temperature, turn_detection_type, noise_suppression, avatar_style
frontend/src/pages/user/voice-session.tsx Page passes hcpProfileId VERIFIED hcpProfileId={hcpProfileId} from scenario
backend/tests/test_voice_live_per_hcp.py Per-HCP token broker tests VERIFIED 8 tests passing
backend/tests/test_hcp_profile_voice.py HCP CRUD voice field tests VERIFIED 10 tests passing
backend/tests/test_agent_sync_service.py Agent instruction override tests VERIFIED 5 new override tests passing (27 total in file)
backend/scripts/seed_phase2.py Seed data with voice/avatar configs VERIFIED 5 HCP profiles with distinct voice_name and avatar_character values
frontend/src/components/voice/voice-session.test.tsx Updated test for new props VERIFIED Uses hcpProfileId: "hcp-1" prop (line 277). No stale mode prop references. tsc -b passes cleanly with 0 errors.

Key Link Verification

From To Via Status Details
voice_live.py (API) voice_live_service.py hcp_profile_id pass-through WIRED hcp_profile_id=hcp_profile_id
voice_live_service.py hcp_profile.py (model) Lazy import hcp_profile_service WIRED from app.services import hcp_profile_service; profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id)
hcp-profile-editor.tsx voice-avatar-tab.tsx Import and render in TabsContent WIRED Import + <VoiceAvatarTab form={form} />
hcp-profile-editor.tsx agent-tab.tsx Import and render in TabsContent WIRED Import + <AgentTab ...>
voice-live.ts (API) Backend POST /voice-live/token hcp_profile_id query param WIRED params = hcpProfileId ? { hcp_profile_id: hcpProfileId } : {}
voice-session-page.tsx voice-session.tsx hcpProfileId prop WIRED hcpProfileId={hcpProfileId}
voice-session.tsx use-voice-token.ts mutateAsync(hcpProfileId) WIRED tokenMutation.mutateAsync(hcpProfileId)
use-voice-live.ts VoiceLiveToken per-HCP fields Session config from tokenData WIRED tokenData.voice_temperature, tokenData.turn_detection_type, tokenData.noise_suppression, tokenData.avatar_style confirmed

Data-Flow Trace (Level 4)

Artifact Data Variable Source Produces Real Data Status
voice-avatar-tab.tsx form (UseFormReturn) Parent hcp-profile-editor.tsx react-hook-form Yes - populated from HCP profile API response via useQuery FLOWING
agent-tab.tsx form + profile Parent form + useQuery HCP profile Yes - profile from API, form from react-hook-form FLOWING
mode-status-indicator.tsx currentMode, initialMode, connectionState Props from voice-session.tsx state Yes - derived from token broker response via resolveMode() FLOWING
hcp-table.tsx profile.voice_name, avatar_character HCP profiles from useHcpProfiles query Yes - DB-backed via API FLOWING
voice-session.tsx tokenData tokenMutation.mutateAsync(hcpProfileId) Yes - token broker API call FLOWING

Behavioral Spot-Checks

Behavior Command Result Status
Frontend tsc -b (gap fix) npx tsc -b --noEmit 0 errors, clean exit PASS
Frontend Vite build npm run build Built in 4.46s, dist/ output generated PASS
Backend tests (45 total) pytest tests/test_voice_live_per_hcp.py tests/test_hcp_profile_voice.py tests/test_agent_sync_service.py -x -v 45 passed in 34.50s PASS
Test file uses hcpProfileId prop grep for hcpProfileId in test Line 277: hcpProfileId: "hcp-1" PASS
Test file has no stale mode prop grep for mode: in test Only mode: "f2f" in mockScenarioData (Scenario type, not VoiceSessionProps) PASS

Requirements Coverage

Requirement Source Plan Description Status Evidence
VOICE-12-01 12-01 Per-HCP digital persona model (voice/avatar columns) SATISFIED 13 columns on HcpProfile model with ORM + Pydantic + migration
VOICE-12-02 12-01 Token broker per-HCP wiring SATISFIED voice_live_service sources all fields from HCP profile
VOICE-12-03 12-02 Admin tabbed HCP editor with Voice & Avatar tab SATISFIED 3-tab layout with VoiceAvatarTab and AgentTab components
VOICE-12-04 12-03 Auto-mode resolution (no manual mode picker) SATISFIED resolveMode() function, hcpProfileId prop replaces mode
VOICE-12-05 12-02 HCP table Voice+Avatar column, i18n SATISFIED Badge pair display, 21+ i18n keys in both locales
VOICE-12-06 12-03 Fallback chain with toast notifications and ModeStatusIndicator SATISFIED 3-level fallback with toast.warning, green/amber/red indicator

Note: VOICE-12-01 through VOICE-12-06 are referenced in ROADMAP.md but NOT formally defined in REQUIREMENTS.md. They are phase-specific IDs created for Phase 12. No orphaned requirements exist -- REQUIREMENTS.md maps no additional IDs to Phase 12.

Anti-Patterns Found

File Line Pattern Severity Impact
voice_live_service.py 105-106 except Exception: pass (silent fallback) Info Intentional design: falls back to defaults when HCP profile lookup fails. Prevents service outage from profile issues.

Human Verification Required

1. HCP Editor Tab Navigation

Test: Open HCP editor, fill in Profile tab fields, switch to Voice & Avatar tab, configure voice/avatar settings, switch to Agent tab, verify override textarea works, switch back to Profile tab. Expected: All form data persists across tab switches. No data loss. Why human: Cross-tab form state persistence requires interactive browser testing.

2. Avatar Character-Style Dynamic Filtering

Test: In Voice & Avatar tab, change avatar character dropdown from "lori" to "lisa". Check if style dropdown options update dynamically. Expected: Style options change to lisa-specific styles (casual-sitting, graceful-sitting, etc.). Previously selected style resets to first available. Why human: Dynamic dropdown filtering requires visual interaction.

3. ModeStatusIndicator Visual States

Test: Start a voice session, observe the ModeStatusIndicator badge color and text during connection, degradation (if simulated), and disconnection. Expected: Green dot + "Connected" when at optimal mode, amber dot + "Degraded" when fallen back, red dot + "Disconnected" on error. Why human: Real-time visual state changes during live WebSocket/Avatar connections.

4. Fallback Chain Toast Notifications

Test: Start a voice session where avatar service is unavailable but voice works. Then start one where voice is also unavailable. Expected: First scenario: toast warning "Avatar unavailable, switching to voice mode". Second: toast warning "Voice unavailable, switching to text mode". Why human: Requires simulating service unavailability with real Azure connections.

5. HCP Table Voice & Avatar Column

Test: View HCP list page with multiple profiles that have different voice/avatar configurations. Expected: Badge pairs show short voice label (e.g., "Ava", "Yunxi") and avatar character-style (e.g., "lori-casual"). Profiles without config show "Not configured". Why human: Visual layout, badge rendering, and label formatting need visual confirmation.

Re-verification: Gap Closure Details

Previous gap: voice-session.test.tsx referenced the removed mode prop from pre-Phase 12-03 VoiceSessionProps interface, producing 12 TypeScript TS2353 errors. tsc -b failed across the full frontend project.

Fix: Commit 8126313 ("fix(12): update voice-session.test.tsx for auto-mode props (mode -> hcpProfileId)") updated the test file to:

  • Replace mode: "voice_pipeline" prop with hcpProfileId: "hcp-1" in defaultProps (line 277)
  • Update mock VoiceSessionHeader to check the new props pattern
  • Remove all references to the removed mode prop on VoiceSessionProps

Verification of fix:

  • npx tsc -b --noEmit now completes with 0 errors
  • grep confirms no stale mode prop references in test (only mode: "f2f" in mockScenarioData which is the Scenario type field, not VoiceSessionProps)
  • npm run build succeeds in 4.46s

Regression check: All 8 previously-verified truths remain verified. All artifacts remain present and substantive. No regressions detected.


Verified: 2026-04-02T14:15:00Z Verifier: Claude (gsd-verifier)

⚠️ **GitHub.com Fallback** ⚠️