Phase 12: Voice Realtime Api Agent

Auto-generated from .planning/phases/12-voice-realtime-api-agent
Last synced: 2026-04-28

Context & Decisions

Phase 12: Voice Realtime API & Agent Mode Integration - Context

Gathered: 2026-04-02 Status: Ready for planning

## Phase Boundary

Extend HCP profiles to be complete "digital persona" configurations — each HCP stores Voice Live API settings (voice name, conversation parameters) and Avatar settings (character, custom avatar) alongside the existing AI Foundry Agent. When an MR selects an HCP and starts a session, the system auto-configures the voice connection with per-HCP settings and defaults to Digital Human Realtime Agent mode with automatic fallback to voice-only or text.

## Implementation Decisions

HCP Voice/Avatar Configuration Scope

D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with customized: true toggle) — matching reference repo pattern
D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP

Admin UX — HCP Editor Redesign

D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column

Session Wiring

D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
D-09: MR cannot override HCP voice/avatar settings during a session — settings are locked per-HCP for consistent experience

Mode Simplification & Fallback

D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker — system auto-selects based on HCP config and service availability
D-11: Fallback chain: Digital Human Realtime Agent → Voice-only Realtime → Text mode. Triggered when avatar service unavailable or network degraded
D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session

Claude's Discretion

Exact DB column types and migration details for new HCP voice/avatar fields
Default avatar/voice options list (can derive from Azure documentation)
Tab component implementation details (reuse existing Tabs from UI library)
WebSocket reconnection strategy on network recovery
Status indicator component design

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Reference Implementation

User's screenshot of Voice Live Agent demo — shows full settings panel (Instructions, Connection Settings, Conversation Settings, Voice, Avatar) with Digital Human avatar rendering and chat

HCP Profile Model & API (Phase 11 output)

backend/app/models/hcp_profile.py — HcpProfile ORM model (extend with voice/avatar fields)
backend/app/schemas/hcp_profile.py — HcpProfileCreate/Update/Response schemas (extend)
backend/app/api/hcp_profiles.py — HCP profile CRUD router
backend/app/services/hcp_profile_service.py — HCP profile service layer
backend/app/services/agent_sync_service.py — Agent sync (extend to sync voice/avatar config)

Voice Live Infrastructure (Phase 08/09 output)

backend/app/services/voice_live_service.py — Token broker (extend to return per-HCP voice/avatar settings)
backend/app/schemas/voice_live.py — VoiceLiveTokenResponse (extend with voice/avatar fields)
backend/app/services/agents/adapters/azure_voice_live.py — Agent/Model mode parse/encode
backend/app/api/voice_live.py — Voice Live API routes

Frontend Voice Components (Phase 08 output)

frontend/src/hooks/use-voice-live.ts — RTClient WebSocket hook (consume per-HCP settings)
frontend/src/hooks/use-avatar-stream.ts — Avatar WebRTC hook (consume per-HCP avatar config)
frontend/src/components/voice/voice-session.tsx — VoiceSession container
frontend/src/components/voice/mode-selector.tsx — Current mode selector (replace with auto-mode + fallback)
frontend/src/components/voice/avatar-view.tsx — Avatar renderer

Frontend Admin (Phase 11 output)

frontend/src/pages/admin/hcp-profiles.tsx — HCP profiles admin page (add tabs)
frontend/src/pages/admin/hcp-profile-editor.tsx — HCP editor (extend with tabs)
frontend/src/components/admin/hcp-table.tsx — HCP table (add Voice+Avatar column)
frontend/src/types/hcp.ts — HCP TypeScript types (extend)

Config & Auth

backend/app/services/config_service.py — AI Foundry unified config
backend/app/services/connection_tester.py — Connection testing patterns

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

HcpProfile model already has agent_id, agent_sync_status fields from Phase 11 — extend with voice/avatar columns
agent_sync_service.py — Pattern for auto-syncing on HCP CRUD, reuse for voice/avatar validation
VoiceLiveTokenResponse — Already returns endpoint, api_key, agent_id — extend with voice/avatar settings
Tabs component in UI library — reuse for HCP editor tabbed layout
useVoiceLive hook — Already handles WebSocket connection, needs to accept per-HCP conversation params
useAvatarStream hook — Already handles WebRTC, needs to accept per-HCP avatar character
mode-selector.tsx — Has the 7-mode mapping, will be replaced by auto-mode logic

Established Patterns

Per-domain TanStack Query hooks with mutation invalidation
Alembic migration with server_default for SQLite compatibility
i18n namespaces per domain (admin, voice)
Token broker pattern: backend generates config, frontend consumes directly
Full-screen session pages without UserLayout

Integration Points

HcpProfile model → add ~12 new columns for voice/avatar settings
Token broker → extend response to include all voice/avatar params from HCP
VoiceSession container → consume per-HCP settings instead of global config
Mode selector → replace with auto-mode + fallback chain logic
HCP editor page → add tabbed layout with Voice & Avatar tab
HCP table → add Voice+Avatar column

</code_context>

## Specific Ideas

Reference implementation screenshot shows the exact settings panel: Instructions, Connection Settings, Conversation Settings (Recognition Language, Noise suppression, Echo cancellation, Turn detection, EOU detection, Temperature), Voice (custom voice toggle, voice name), Avatar (toggle, custom avatar toggle, character)
Each HCP becomes a complete "digital persona" — personality + voice + appearance
Smart defaults mean new HCPs work immediately for demo without manual configuration
Fallback chain matches the user's note: "voice+avatar as default, fallback to voice or text if service unavailable or network bad"
Token broker is the single integration point — frontend gets everything it needs in one call

## Deferred Ideas

Developer mode toggle for MRs to override HCP settings during debug sessions — future enhancement
Per-session provider override — always use HCP-level config for now
Azure AD token auth (DefaultAzureCredential) for Entra token acquisition — future phase
Multiple avatar characters per HCP (wardrobe selection) — future enhancement
Voice cloning / custom neural voice training — future phase

Phase: 12-voice-realtime-api-agent Context gathered: 2026-04-02

Plans (4)

#	Plan File	Status
12-01	12-01-PLAN.md	Complete
12-02	12-02-PLAN.md	Complete
12-03	12-03-PLAN.md	Complete
12-04	12-04-PLAN.md	Complete

Research

Click to expand research notes

Phase 12: Voice Realtime API & Agent Mode Integration - Research

Researched: 2026-04-02 Domain: HCP digital persona configuration, Voice Live API session wiring, auto-mode + fallback chain Confidence: HIGH

Summary

Phase 12 extends HCP profiles into complete "digital persona" configurations that bundle voice, avatar, and conversation parameters alongside the existing AI Foundry Agent. The token broker API becomes the single integration point: it reads all per-HCP settings and returns them to the frontend, which auto-configures WebSocket and Avatar connections without manual mode selection. The fallback chain (Digital Human Realtime Agent -> Voice-only Realtime -> Text) replaces the current 7-mode ModeSelector with automatic degradation.

The codebase is well-structured for this extension. The HcpProfile ORM model needs ~12 new columns for voice/avatar settings. The VoiceLiveTokenResponse schema already returns voice_name, avatar_character, and agent_id -- these just need to be sourced from HCP profile data instead of global config. The frontend VoiceSession container already implements a basic fallback chain (avatar failure -> voice-only -> text); it needs refinement to consume per-HCP settings from the token broker and display a persistent mode status indicator.

Primary recommendation: Work bottom-up: database migration first, then backend schema/service extension, then frontend HCP editor tabs, then session wiring with auto-mode + fallback, then integration testing.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

D-01: Full Voice Live settings stored per HCP profile: voice name, avatar character/style, temperature, turn detection (Server VAD), noise suppression, echo cancellation, EOU detection, recognition language, custom voice toggle, custom avatar toggle
D-02: Agent instructions are auto-generated from HCP personality fields but admin can view and override the generated text in the HCP editor
D-03: Avatar supports both predefined Azure Avatar characters (Lisa, Lori, Harry, etc. in dropdown) and custom avatars (character name with customized: true toggle) -- matching reference repo pattern
D-04: New HCPs get smart defaults: voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD, noise suppression off, echo cancellation off, EOU detection disabled, recognition language "Auto Detect". Admin can override per-HCP
D-05: HCP editor uses tabbed layout with 3 tabs: "Profile" (existing personality/specialty/objections fields), "Voice & Avatar" (voice name, avatar character, conversation parameters), "Agent" (auto-generated + editable instructions text, agent sync status)
D-06: HCP table adds a Voice+Avatar column showing voice name + avatar character as badges (e.g. "Ava / Lori-casual") or "Not configured" if missing
D-07: Table maintains existing columns from Phase 11 (Name, Specialty, Personality, Agent Status) plus new Voice+Avatar column
D-08: Token broker API returns all HCP voice/avatar settings (voice name, avatar character, conversation params) alongside auth token/endpoint. Frontend auto-configures WebSocket and Avatar connection from this single response
D-09: MR cannot override HCP voice/avatar settings during a session -- settings are locked per-HCP for consistent experience
D-10: Default to Digital Human Realtime Agent mode (best experience). MR does NOT see a mode picker -- system auto-selects based on HCP config and service availability
D-11: Fallback chain: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Triggered when avatar service unavailable or network degraded
D-12: Fallback notification: toast alert for the initial fallback event ("Avatar unavailable, switching to voice mode") PLUS persistent status indicator showing current active mode throughout the session

Claude's Discretion

Exact DB column types and migration details for new HCP voice/avatar fields
Default avatar/voice options list (can derive from Azure documentation)
Tab component implementation details (reuse existing Tabs from UI library)
WebSocket reconnection strategy on network recovery
Status indicator component design

Deferred Ideas (OUT OF SCOPE)

Developer mode toggle for MRs to override HCP settings during debug sessions
Per-session provider override -- always use HCP-level config for now
Azure AD token auth (DefaultAzureCredential) for Entra token acquisition
Multiple avatar characters per HCP (wardrobe selection)
Voice cloning / custom neural voice training </user_constraints>

Standard Stack

Core

Library	Version	Purpose	Why Standard
SQLAlchemy 2.0 (async)	>=2.0.0	ORM model extension for voice/avatar fields	Already in use, async throughout
Alembic	>=1.13.0	Database migration for new columns	Required by project rules
Pydantic v2	>=2.0.0	Schema extension for voice/avatar fields	Already in use for all schemas
@radix-ui/react-tabs	(via project UI lib)	Tabbed HCP editor layout	Already available as `Tabs` component
react-hook-form + zod	(via project)	Form validation for voice/avatar settings tab	Already used in HCP editor
rt-client	0.5.2	Voice Live WebSocket connection	Already installed from reference repo

Supporting

Library	Version	Purpose	When to Use
sonner	(via project)	Toast notifications for fallback alerts	Fallback chain notifications
lucide-react	>=0.460.0	Icons for mode status indicator	Status indicator component

Alternatives Considered

None -- this phase extends existing infrastructure, not introducing new libraries.

Architecture Patterns

Recommended Project Structure

New/modified files organized by domain:

backend/
  alembic/versions/
    i12a_add_voice_avatar_fields_to_hcp_profile.py   # NEW: migration
  app/
    models/hcp_profile.py                              # EXTEND: ~12 new columns
    schemas/hcp_profile.py                             # EXTEND: voice/avatar fields
    schemas/voice_live.py                              # EXTEND: per-HCP fields in response
    services/voice_live_service.py                     # EXTEND: source settings from HCP
    services/hcp_profile_service.py                    # EXTEND: handle voice/avatar in CRUD
    api/voice_live.py                                  # EXTEND: accept hcp_profile_id param

frontend/
  src/
    types/hcp.ts                                       # EXTEND: voice/avatar fields
    types/voice-live.ts                                # EXTEND: new token response fields
    pages/admin/hcp-profile-editor.tsx                 # REWRITE: tabbed layout
    components/admin/hcp-table.tsx                     # EXTEND: Voice+Avatar column
    components/admin/voice-avatar-tab.tsx              # NEW: Voice & Avatar settings tab
    components/admin/agent-tab.tsx                     # NEW: Agent instructions tab
    components/voice/voice-session.tsx                 # EXTEND: auto-mode + per-HCP config
    components/voice/mode-status-indicator.tsx         # NEW: persistent mode badge
    components/voice/mode-selector.tsx                 # REMOVE: no longer needed (auto-mode)
    hooks/use-voice-token.ts                           # EXTEND: pass hcp_profile_id
    api/voice-live.ts                                  # EXTEND: pass hcp_profile_id to token

Pattern 1: Per-HCP Token Broker Extension

What: Token broker reads HCP profile to source voice/avatar/conversation settings instead of global config. When to use: Every voice session start. Example:

# Source: existing voice_live_service.py pattern, extended per D-08
async def get_voice_live_token(
    db: AsyncSession,
    hcp_profile_id: str | None = None,
) -> VoiceLiveTokenResponse:
    # ... existing config fetch ...

    # Source voice/avatar from HCP profile (D-08)
    if hcp_profile_id:
        profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id)
        voice_name = profile.voice_name or "en-US-AvaNeural"
        avatar_character = profile.avatar_character or "lori"
        avatar_style = profile.avatar_style or "casual"
        avatar_customized = profile.avatar_customized
        temperature = profile.voice_temperature or 0.9
        # ... etc for all conversation params

    return VoiceLiveTokenResponse(
        # ... existing fields ...
        voice_name=voice_name,
        avatar_character=avatar_character,
        avatar_style=avatar_style,
        avatar_customized=avatar_customized,
        temperature=temperature,
        turn_detection_type=turn_detection_type,
        noise_suppression=noise_suppression,
        echo_cancellation=echo_cancellation,
        eou_detection=eou_detection,
        recognition_language=recognition_language,
    )

Pattern 2: Auto-Mode with Fallback Chain (D-10, D-11)

What: Frontend automatically selects the best mode based on HCP config and service availability. No ModeSelector exposed to MR. When to use: Session initialization in VoiceSession container. Example:

// Source: existing voice-session.tsx fallback pattern, refined per D-10/D-11
const resolveMode = (tokenData: VoiceLiveToken): SessionMode => {
  // D-10: Default to Digital Human Realtime Agent (best experience)
  if (tokenData.avatar_enabled && tokenData.agent_id) {
    return "digital_human_realtime_agent";
  }
  if (tokenData.avatar_enabled) {
    return "digital_human_realtime_model";
  }
  if (tokenData.agent_id) {
    return "voice_realtime_agent";
  }
  return "voice_realtime_model";
};

// D-11: Fallback chain on connection failure
// Avatar fails -> voice-only; Voice fails -> text

Pattern 3: Tabbed HCP Editor (D-05)

What: Replace current single-page editor with 3-tab layout using existing Radix Tabs. When to use: HCP profile create/edit page. Example:

// Source: existing Tabs component from @/components/ui/tabs
<Tabs defaultValue="profile">
  <TabsList>
    <TabsTrigger value="profile">Profile</TabsTrigger>
    <TabsTrigger value="voice-avatar">Voice & Avatar</TabsTrigger>
    <TabsTrigger value="agent">Agent</TabsTrigger>
  </TabsList>
  <TabsContent value="profile">
    {/* Existing personality/specialty/objections fields */}
  </TabsContent>
  <TabsContent value="voice-avatar">
    <VoiceAvatarTab form={form} />
  </TabsContent>
  <TabsContent value="agent">
    <AgentTab profile={profile} onRetrySync={handleRetrySync} />
  </TabsContent>
</Tabs>

Anti-Patterns to Avoid

Exposing mode picker to MR (D-09/D-10): MRs must NOT manually select voice/avatar modes. System auto-selects.
Global voice/avatar config fallback: Always source from HCP profile. Only fall back to global defaults when HCP has no configuration.
Mixing tab state with form state: All voice/avatar fields must be part of the single react-hook-form instance, not separate state.
Storing avatar settings in a separate table: Keep all HCP digital persona fields in the same hcp_profiles table -- simpler queries, no joins needed.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Tabbed layout	Custom tab switching logic	Radix Tabs (`@/components/ui/tabs`)	Already in UI library, accessible, keyboard-navigable
Avatar character list	Hardcoded constants	Azure standard avatars list from docs	Authoritative source, characters updated by Microsoft
Form validation for new fields	Manual validation in handlers	zod schema extension in existing HCP form	Already established pattern in hcp-profile-editor.tsx
WebSocket session config	Manual JSON construction	Extend existing `useVoiceLive` hook	Hook already builds session config from tokenData
Persistent mode indicator	Custom status component	Badge + cn() from existing UI primitives	Consistent with existing badge patterns in the project

Common Pitfalls

Pitfall 1: SQLite batch_alter_table Required for Adding Columns

What goes wrong: Alembic op.add_column() fails on SQLite for certain operations. Why it happens: SQLite doesn't fully support ALTER TABLE. The project already uses batch operations. How to avoid: Use with op.batch_alter_table("hcp_profiles") as batch_op: for all column additions, with server_default on every column. Warning signs: Migration fails locally but would work on PostgreSQL.

Pitfall 2: Token Broker Must Pass hcp_profile_id from Frontend

What goes wrong: Token broker returns global config instead of per-HCP settings because hcp_profile_id is not passed. Why it happens: The current POST /voice-live/token endpoint doesn't accept hcp_profile_id. The voice session page gets session data which includes scenario_id, and scenario has hcp_profile_id. How to avoid: Extend the token endpoint to accept hcp_profile_id as a query parameter or request body field. Wire it through from VoiceSessionPage -> useVoiceToken -> fetchVoiceLiveToken -> API. Warning signs: All HCPs use the same voice/avatar during sessions.

Pitfall 3: Avatar Character vs Style are Separate Fields

What goes wrong: Avatar character and style are concatenated or confused (e.g., "lori-casual" vs character="lori" style="casual"). Why it happens: Azure Avatar API requires character and style as separate fields in the session config JSON. The reference screenshots show them combined in UI display. How to avoid: Store avatar_character and avatar_style as separate DB columns. Display combined in table badges. Send separate in WebSocket session config. Warning signs: Avatar fails to render because character name includes the style suffix.

Pitfall 4: Form Reset on Tab Switch Loses Unsaved Changes

What goes wrong: Switching tabs resets form fields if each tab has its own form state. Why it happens: Multiple form instances or conditional rendering that unmounts tab content. How to avoid: Use a single react-hook-form instance that spans all tabs. Radix Tabs renders all TabsContent in DOM by default (just hidden), so form state persists across tab switches. Warning signs: Admin fills voice settings, switches to Profile tab, switches back, and settings are gone.

Pitfall 5: Lazy Import for hcp_profile_service in voice_live_service

What goes wrong: Circular import error when voice_live_service imports hcp_profile_service at module level. Why it happens: Already documented as Phase 11 decision -- voice_live_service uses lazy import inside the function body. How to avoid: Continue using the existing lazy import pattern: from app.services import hcp_profile_service inside the function, not at module level. Warning signs: ImportError on server startup.

Pitfall 6: Avatar Session Config Structure Must Match Azure API

What goes wrong: Avatar doesn't render because session config JSON structure doesn't match Azure Voice Live API expected format. Why it happens: The avatar config in session.update requires specific nested structure: { character, style, customized, video: { codec, crop, resolution } }. How to avoid: Use the exact Azure API structure from the Voice Live how-to docs. The existing useVoiceLive hook already sends avatar config but without style and customized fields -- extend it. Warning signs: WebSocket connection succeeds but avatar video stream never starts.

Code Examples

Azure Voice Live Session Config with Per-HCP Settings

{
  "instructions": "You are Dr. Zhang, an Oncology specialist...",
  "turn_detection": {
    "type": "server_vad",
    "silence_duration_ms": 500
  },
  "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"},
  "input_audio_echo_cancellation": {"type": "server_echo_cancellation"},
  "voice": {
    "name": "en-US-Ava:DragonHDLatestNeural",
    "type": "azure-standard",
    "temperature": 0.9
  },
  "input_audio_transcription": {
    "model": "azure-speech",
    "language": "zh-CN"
  },
  "avatar": {
    "character": "lori",
    "style": "casual",
    "customized": false,
    "video": {
      "codec": "h264",
      "crop": {"top_left": [560, 0], "bottom_right": [1360, 1080]}
    }
  },
  "agent_id": "dr-zhang-oncology",
  "project_name": "ai-coach-project"
}

Source: Azure Voice Live API how-to docs (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to)

Azure Standard Video Avatar Characters (for dropdown)

// Source: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars
const AVATAR_VIDEO_CHARACTERS = [
  { character: "harry",  styles: ["business", "casual", "youthful"] },
  { character: "jeff",   styles: ["business", "formal"] },
  { character: "lisa",   styles: ["casual-sitting", "graceful-sitting", "graceful-standing", "technical-sitting", "technical-standing"] },
  { character: "lori",   styles: ["casual", "graceful", "formal"] },
  { character: "max",    styles: ["business", "casual", "formal"] },
  { character: "meg",    styles: ["formal", "casual", "business"] },
] as const;

// Note: Photo avatars (Adrian, Amara, Bianca, etc.) are also available but only at 512x512 resolution.
// Video avatars are recommended for this project due to 1920x1080 resolution.

HCP Profile Voice/Avatar DB Columns

# Source: Derived from D-01 and Azure Voice Live session config
# All columns use server_default for SQLite compatibility with existing rows

# Voice settings
voice_name: Mapped[str] = mapped_column(String(200), default="en-US-AvaNeural")
voice_type: Mapped[str] = mapped_column(String(50), default="azure-standard")
voice_temperature: Mapped[float] = mapped_column(default=0.9)
voice_custom: Mapped[bool] = mapped_column(Boolean, default=False)

# Avatar settings
avatar_character: Mapped[str] = mapped_column(String(100), default="lori")
avatar_style: Mapped[str] = mapped_column(String(100), default="casual")
avatar_customized: Mapped[bool] = mapped_column(Boolean, default=False)

# Conversation parameters
turn_detection_type: Mapped[str] = mapped_column(String(50), default="server_vad")
noise_suppression: Mapped[bool] = mapped_column(Boolean, default=False)
echo_cancellation: Mapped[bool] = mapped_column(Boolean, default=False)
eou_detection: Mapped[bool] = mapped_column(Boolean, default=False)
recognition_language: Mapped[str] = mapped_column(String(20), default="auto")

# Agent instruction override (D-02)
agent_instructions_override: Mapped[str] = mapped_column(Text, default="")

Extended VoiceLiveTokenResponse Schema

# Source: Extend existing backend/app/schemas/voice_live.py
class VoiceLiveTokenResponse(BaseModel):
    # Existing fields
    endpoint: str
    token: str
    region: str
    model: str
    avatar_enabled: bool
    avatar_character: str
    voice_name: str
    agent_id: str | None = None
    project_name: str | None = None

    # New per-HCP fields (D-08)
    avatar_style: str = "casual"
    avatar_customized: bool = False
    voice_type: str = "azure-standard"
    voice_temperature: float = 0.9
    turn_detection_type: str = "server_vad"
    noise_suppression: bool = False
    echo_cancellation: bool = False
    eou_detection: bool = False
    recognition_language: str = "auto"

Turn Detection Types (for dropdown)

// Source: Azure Voice Live API how-to docs
const TURN_DETECTION_TYPES = [
  { value: "server_vad", label: "Server VAD" },
  { value: "semantic_vad", label: "Semantic VAD (gpt-realtime only)" },
  { value: "azure_semantic_vad", label: "Azure Semantic VAD (all models)" },
  { value: "azure_semantic_vad_multilingual", label: "Azure Semantic VAD Multilingual" },
] as const;

Voice Name Options (common Azure TTS voices)

// Source: Azure Speech TTS voice list (commonly used for Chinese + English)
const VOICE_NAME_OPTIONS = [
  // English voices
  { value: "en-US-AvaNeural", label: "Ava (EN-US)" },
  { value: "en-US-Ava:DragonHDLatestNeural", label: "Ava HD (EN-US)" },
  { value: "en-US-AndrewNeural", label: "Andrew (EN-US)" },
  { value: "en-US-JennyNeural", label: "Jenny (EN-US)" },
  // Chinese voices
  { value: "zh-CN-XiaoxiaoMultilingualNeural", label: "Xiaoxiao Multilingual (ZH-CN)" },
  { value: "zh-CN-XiaoxiaoNeural", label: "Xiaoxiao (ZH-CN)" },
  { value: "zh-CN-YunxiNeural", label: "Yunxi (ZH-CN)" },
  { value: "zh-CN-YunjianNeural", label: "Yunjian (ZH-CN)" },
] as const;

State of the Art

Old Approach	Current Approach	When Changed	Impact
Global voice/avatar config	Per-HCP voice/avatar config	Phase 12	Each HCP is a complete digital persona
7-mode manual selector	Auto-mode with fallback chain	Phase 12	MRs never see mode picker
server_vad only	Multiple turn detection types	Voice Live API 2025-10	azure_semantic_vad works with all models
Single avatar character globally	Per-HCP avatar character + style	Phase 12	Different HCPs look different
h264 only codec	h264 remains default (Video Avatar)	Current	Photo Avatar supports vp9 but lower res

Azure Voice Live API supported models (current):

gpt-realtime, gpt-realtime-mini, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, phi4-mm-realtime, phi4-mini

Turn detection types available:

server_vad (default, all models)
semantic_vad (gpt-realtime/gpt-realtime-mini only)
azure_semantic_vad (all models, Voice Live specific)
azure_semantic_vad_multilingual (all models, multilingual support)

Open Questions

Avatar style naming format
- What we know: Azure API uses separate character and style fields (e.g., character="lisa", style="casual-sitting"). The existing codebase stores avatar_character as a combined string like "Lisa-casual-sitting".
- What's unclear: Should we store combined (backward compatible) or split (matches API)?
- Recommendation: Store split (avatar_character + avatar_style) to match Azure API structure. Combine for display only. The migration can default avatar_character="lori" and avatar_style="casual".
Recognition language "Auto Detect" value
- What we know: Azure Voice Live docs show "language": "en" for explicit language. D-04 says default "Auto Detect".
- What's unclear: The exact value for auto-detect in the Azure API (empty string? omit the field?).
- Recommendation: Use empty string "" or omit language field from input_audio_transcription config when "auto" is selected. Store "auto" in DB, translate to API format at WebSocket config time.
Whether to keep ModeSelector component
- What we know: D-10 says MR does NOT see a mode picker. But the admin/debug use case was deferred.
- What's unclear: Should mode-selector.tsx be deleted or just hidden from MR view?
- Recommendation: Keep the file but do not render it in the voice session. The auto-mode logic replaces its function. The component can be restored later if developer mode is implemented.

Project Constraints (from CLAUDE.md)

Coding Standards

Async everywhere: all backend functions must be async def
Pydantic v2 schemas with model_config = ConfigDict(from_attributes=True)
Route ordering: static paths before parameterized (/{id})
Service layer holds business logic, routers only handle HTTP
No raw SQL -- use SQLAlchemy ORM
TypeScript strict mode: no any, no unused variables
TanStack Query hooks per domain, no inline useQuery
cn() for conditional class composition
i18n: all UI text externalized via react-i18next
Conventional commits: feat:, fix:, docs:, test:

Database Rules

NEVER modify schema without Alembic migration
All models use TimestampMixin
batch_alter_table with server_default for SQLite compatibility
Current Alembic head: b820e86271f8

Pre-Commit Checklist

Backend: ruff check ., ruff format --check ., pytest -v
Frontend: npx tsc -b, npm run build

Sources

Primary (HIGH confidence)

Azure Voice Live API how-to: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live-how-to -- session.update config structure, turn detection types, voice config, avatar config, noise suppression, echo cancellation
Azure Standard Avatars: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/standard-avatars -- full character list with styles (Harry, Jeff, Lisa, Lori, Max, Meg + photo avatars)
Azure Voice Live overview: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/voice-live -- supported models, pricing tiers, feature list
Existing codebase files (all read directly):
- backend/app/models/hcp_profile.py -- current ORM model
- backend/app/schemas/hcp_profile.py -- current Pydantic schemas
- backend/app/services/voice_live_service.py -- current token broker
- backend/app/schemas/voice_live.py -- current token response schema
- backend/app/services/hcp_profile_service.py -- CRUD with agent sync hooks
- backend/app/services/agent_sync_service.py -- agent instructions builder
- frontend/src/hooks/use-voice-live.ts -- WebSocket session config builder
- frontend/src/hooks/use-avatar-stream.ts -- WebRTC avatar connection
- frontend/src/components/voice/voice-session.tsx -- session container with fallback
- frontend/src/components/voice/mode-selector.tsx -- 7-mode selector (to be replaced)
- frontend/src/pages/admin/hcp-profile-editor.tsx -- current editor layout
- frontend/src/components/admin/hcp-table.tsx -- current table columns
- frontend/src/types/hcp.ts -- HCP TypeScript types
- frontend/src/types/voice-live.ts -- Voice Live types
- frontend/src/components/ui/tabs.tsx -- Radix Tabs available in UI library
- backend/app/services/region_capabilities.py -- region/service availability maps

Secondary (MEDIUM confidence)

Azure OpenAI Realtime API reference (linked from Voice Live docs) -- base event format that Voice Live extends

Tertiary (LOW confidence)

Voice name list is a commonly-used subset, not exhaustive. Azure has 600+ standard voices. The admin should have a text input with the dropdown as suggestions, not a locked select.

Metadata

Confidence breakdown:

Standard stack: HIGH - all libraries already in the project, no new dependencies
Architecture: HIGH - extending well-established patterns (token broker, HCP CRUD, form hooks)
Pitfalls: HIGH - based on direct codebase reading and established project conventions
Azure API config structure: HIGH - verified from official Microsoft documentation (updated 2026-02-04 / 2026-03-16)

Research date: 2026-04-02 Valid until: 2026-05-02 (stable -- Azure Voice Live API is GA, avatar characters list stable)

UI Specification

Click to expand UI spec

Phase 12 -- UI Design Contract

Visual and interaction contract for the Voice Realtime API & Agent Mode Integration phase. Generated by gsd-ui-researcher, verified by gsd-ui-checker.

Design System

Property	Value
Tool	none (Tailwind CSS v4 with `@theme inline` custom properties)
Preset	not applicable
Component library	Radix UI (via project `@/components/ui/*` wrappers)
Icon library	lucide-react >=0.460.0
Font	Inter + Noto Sans SC (sans-serif), JetBrains Mono (monospace)

Source: Existing frontend/src/styles/index.css @theme inline block, established in Phase 01. No new design system installations required.

Spacing Scale

Declared values (must be multiples of 4):

Token	Value	Usage in Phase 12
xs	4px	Icon gaps, inline badge padding within Voice+Avatar column (`gap-1`), switch-to-label gap
sm	8px	Compact element spacing, tab trigger padding, form field gaps within a row, dot-to-text gap in ModeStatusIndicator (`gap-2`)
md	16px	Default element spacing, card content padding, tab content top margin, form field vertical gaps (`space-y-4`)
lg	24px	Section padding within cards, gap between form sections inside a tab (`space-y-6`)
xl	32px	Gap between major card sections in the editor, header-to-content gap
2xl	48px	Page-level top/bottom padding
3xl	64px	Not used in this phase

Exceptions: Touch target minimum 44px for voice session controls (mic button, end session button) per existing Phase 08 pattern.

Typography

Role	Size	Weight	Line Height	Phase 12 Usage
Badge/Indicator	12px (`text-xs`)	400 or 600	1.5	Badge text in HCP table Voice+Avatar column, ModeStatusIndicator text (`font-semibold`), agent sync status badges
Body	14px (`text-sm`)	400 (normal)	1.5	Form field values, table cell text, Textarea content, transcript text
Label	14px (`text-sm`)	400 (normal)	1.5	FormLabel text, Switch labels, Select labels. Differentiated from body via `text-muted-foreground` color, not weight
Heading	16px (`text-base`)	600 (semibold)	1.5	CardTitle in each form section (Voice Settings, Avatar Settings, etc.), tab triggers
Display	24px (`text-2xl`)	600 (semibold)	1.5	Not used in this phase (no page-level display headings introduced)

Two weights only: 400 (normal) for body text and labels, 600 (semibold) for headings.

Color

Role	Value	Usage in Phase 12
Dominant (60%)	`var(--background)` `#FFFFFF`	Page background, tab content background, form input backgrounds
Secondary (30%)	`var(--card)` `#FFFFFF` / `var(--muted)` `#ececf0`	Cards in HCP editor, table header row `bg-slate-50/50`, tab list background `bg-muted`, disabled Textarea `bg-muted/50`
Accent (10%)	`var(--primary)` `#1E40AF`	Save Profile primary button (`bg-primary`), active tab trigger shadow highlight
Destructive	`var(--destructive)` `#EF4444`	Delete HCP action, End Session button, failed agent sync status badge (`bg-red-100 text-red-700`), disconnected mode status dot

Accent reserved for:

Save Profile primary button (bg-primary)
Active TabsTrigger state (uses bg-background with shadow per Radix default, not direct accent fill)

Additional semantic colors used in this phase (already established):

Token	Value	Usage
Green	`bg-green-500` (dot) / `bg-green-100 text-green-700` (badge)	Connected mode status dot, synced agent badge, avatar active indicator
Amber	`bg-amber-500` (dot) / `bg-amber-100 text-amber-700` (badge)	Degraded mode status dot, pending agent sync badge
Red	`bg-destructive` (dot) / `bg-red-100 text-red-700` (badge)	Disconnected mode status dot, failed agent sync badge
Muted foreground	`var(--muted-foreground)` `#717182`	"Not configured" badge text, disabled form labels, placeholder text

Source: Existing CSS custom properties in index.css, Phase 10 theme system. ModeStatusIndicator dot colors verified from implementation: bg-green-500, bg-amber-500, bg-destructive.

Focal Points

Screen	Primary Focal Point	Rationale
HCP Profile Editor	Save Profile button (top-right of header bar)	The single CTA that commits all tab changes; placed in persistent header outside tabs so it remains visible regardless of active tab
Voice Session	ModeStatusIndicator (center of VoiceSessionHeader)	Communicates the live connection state and active mode; center placement ensures MR always knows session health at a glance
HCP Table	Voice & Avatar column badges	New column added in this phase; draws attention to per-HCP digital persona configuration status

Component Inventory

New Components

Component	Location	Description
VoiceAvatarTab	`frontend/src/components/admin/voice-avatar-tab.tsx`	Form tab content for Voice & Avatar settings. Contains: voice name Select/Input with custom voice Switch toggle, avatar character Select with avatar style Select (linked -- style options filter by selected character using `AVATAR_VIDEO_CHARACTERS` constant), custom avatar Switch toggle, conversation parameters (temperature Slider 0.0-1.0 step 0.1, turn detection Select with 4 options, boolean Switches for noise suppression / echo cancellation / EOU detection, recognition language Select). Uses `UseFormReturn<HcpFormValues>` from parent form instance.
AgentTab	`frontend/src/components/admin/agent-tab.tsx`	Form tab content for Agent instructions and sync status. Contains: agent status Card (icon + status label + agent_id with Tooltip + Retry Sync Button + View in Azure Portal link), auto-generated instructions preview via `buildPreviewInstructions()` (disabled Textarea), editable override Textarea (`agent_instructions_override` form field). Uses `AGENT_STATUS_CONFIG` constant for status icon/color/bg mapping.
ModeStatusIndicator	`frontend/src/components/voice/mode-status-indicator.tsx`	Persistent session mode badge replacing the center Badge in VoiceSessionHeader. Shows current active mode label (from `voice:modeBadge.*` i18n keys) with colored dot: green (`bg-green-500`) when at optimal mode (`currentMode === initialMode`), amber (`bg-amber-500`) when degraded (`currentMode !== initialMode`), red (`bg-destructive`) when disconnected/error. Uses Badge `variant="outline"` with `role="status" aria-live="polite"`. Dot is `size-2 shrink-0 rounded-full`. Gap between dot and text: `gap-2` (8px).

Modified Components

Component	Changes
`hcp-profile-editor.tsx`	Replaced single-page form layout with 3-tab Tabs layout. Form wraps Tabs (not individual TabsContent) for cross-tab state persistence via single `useForm<HcpFormValues>` instance. Profile tab wraps existing Identity/Personality/Knowledge/Interaction Cards. Voice & Avatar tab renders `VoiceAvatarTab`. Agent tab renders `AgentTab`. Zod schema extended with 13 voice/avatar fields. Header with Save button remains outside tabs.
`hcp-table.tsx`	Added "Voice & Avatar" column after "Agent Status" column. Renders voice name (via `getVoiceLabel()` helper) and avatar character-style as inline Badge pair (`variant="outline"`, `text-xs`), or `text-muted-foreground` "Not configured" text when defaults. Column is non-sortable.
`voice-session.tsx`	Removed `mode` prop from external interface. Added `hcpProfileId` prop. Auto-resolves mode from token broker response via `resolveMode(tokenData)` function (D-10). Implements fallback chain (D-11): avatar fail -> voice-only -> text with `toast.warning()` notifications. Passes `hcpProfileId` to `useVoiceToken` hook. Passes `currentMode`, `initialMode`, `connectionState` to `ModeStatusIndicator`.
`voice-session-header.tsx`	Replaced center static Badge with `ModeStatusIndicator` component. Passes `currentMode`, `initialMode`, and `connectionState` as props.
`mode-selector.tsx`	File retained but component no longer rendered in voice session pages (D-10). Not deleted to allow future developer-mode restoration.

Reused Components (no changes)

Component	Usage in Phase 12
Tabs / TabsList / TabsTrigger / TabsContent	HCP editor 3-tab layout
Select / SelectTrigger / SelectContent / SelectItem	Voice name, avatar character, avatar style, turn detection, recognition language dropdowns
Switch	Custom voice toggle, custom avatar toggle, noise suppression, echo cancellation, EOU detection
Slider	Temperature (0.0 - 1.0, step 0.1)
Input	Custom voice name text input (shown when custom voice toggle is on)
Badge	Voice+Avatar column in HCP table, mode status in session header
Card / CardHeader / CardTitle / CardContent	Form section containers within each tab
Tooltip / TooltipTrigger / TooltipContent	Agent ID display, agent status error details
Dialog	End session confirmation (existing)
toast (sonner)	Fallback notifications (D-12), save success/error, sync success/error
Form / FormField / FormItem / FormLabel / FormControl / FormMessage	All form fields in all three tabs
Textarea	Auto-generated instructions (disabled), override instructions (editable)
Button	Save Profile, Retry Sync, View in Azure Portal, back navigation
Label	Switch companion labels using `htmlFor` binding

Interaction Contracts

I-01: HCP Editor Tab Navigation (D-05)

Trigger: Admin clicks a tab trigger (Profile / Voice & Avatar / Agent). Behavior: Radix Tabs switches visible content instantly. All three TabsContent panels remain mounted in DOM (Radix default behavior). Form state from react-hook-form persists across tab switches because a single useForm<HcpFormValues> instance wraps all tabs at the <Form> level above <Tabs>. Visual: Active tab trigger shows bg-background with shadow (default TabsTrigger style from data-[state=active]). Inactive triggers show text-muted-foreground. Constraint: Tab switching must NOT trigger form validation. Validation only runs on Save button click via form.handleSubmit().

I-02: Voice Name Selection (D-01, D-04)

Trigger: Admin interacts with voice name field in Voice & Avatar tab. Behavior: When "Custom voice" Switch is OFF (default), show a Select dropdown with preset voice options from VOICE_NAME_OPTIONS constant (8 options: 4 English, 4 Chinese). When toggled ON, show a text Input for free-form voice name entry. Default value: "en-US-AvaNeural" per D-04. Visual: Custom voice Switch at top of Voice Settings card with flex items-center justify-between layout. Select dropdown below, or Input when custom mode enabled, with placeholder "e.g., en-US-Ava:DragonHDLatestNeural".

I-03: Avatar Character + Style Selection (D-03)

Trigger: Admin selects avatar character in Voice & Avatar tab. Behavior: Two linked Select dropdowns. Character dropdown shows 6 video avatar characters from AVATAR_VIDEO_CHARACTERS constant (harry, jeff, lisa, lori, max, meg). When character changes, style dropdown filters to show only valid styles for that character via useMemo. When "Custom avatar" Switch is ON, character becomes a text Input. Default: character "lori", style "casual" per D-04. Visual: Side-by-side Select dropdowns using grid grid-cols-2 gap-4. Character label left column, style label right column. Custom avatar Switch below with same flex items-center justify-between layout as custom voice.

I-04: Conversation Parameters (D-01)

Trigger: Admin adjusts conversation parameters in Voice & Avatar tab. Behavior: Temperature uses Slider (min 0.0, max 1.0, step 0.1, default 0.9). Turn detection uses Select with 4 options from TURN_DETECTION_TYPES constant (server_vad default). Noise suppression, echo cancellation, EOU detection each use Switch (all default OFF per D-04). Recognition language uses Select with options from RECOGNITION_LANGUAGES constant including "Auto Detect" (default "auto"). Visual: Stacked form fields within a Conversation Parameters Card. Fields use space-y-4. Switch rows use flex items-center justify-between. Temperature shows current numeric value to the right of the Slider.

I-05: Agent Instructions Override (D-02)

Trigger: Admin views Agent tab. Behavior: Auto-generated instructions text (built by buildPreviewInstructions() from current form values including name, specialty, personality, objections, expertise) displayed in a disabled Textarea with muted background. Below it, an editable Textarea (agent_instructions_override form field) allows admin to write custom instructions. If override is non-empty, it takes priority when syncing to AI Foundry (checked in backend build_agent_instructions). Visual: Two Textareas stacked vertically within an Agent Instructions Card. Top one: disabled with bg-muted/50 appearance, rows=6. Bottom one: standard input style, rows=6, placeholder text from admin:hcp.overridePlaceholder.

I-06: HCP Table Voice+Avatar Column (D-06, D-07)

Trigger: Table renders with HCP profile data. Behavior: New column after "Agent Status". Shows voice name (shortened via getVoiceLabel() helper -- e.g., "en-US-AvaNeural" becomes "Ava") and avatar character+style combined as two inline Badge elements. If HCP has no voice_name or avatar_character, shows "Not configured" text. Visual: Two Badge variant="outline" side by side with gap-1 (4px). Both badges use text-xs. Voice badge shows short name (e.g., "Ava"). Avatar badge shows combined character-style (e.g., "Lori-casual"). When not configured: plain text-xs text-muted-foreground text.

I-07: Auto-Mode Resolution (D-10)

Trigger: Voice session starts, token broker response received via useVoiceToken hook. Behavior: resolveMode(tokenData) determines the best available mode:

If tokenData.avatar_enabled && tokenData.agent_id -> "digital_human_realtime_agent"
If tokenData.avatar_enabled -> "digital_human_realtime_model"
If tokenData.agent_id -> "voice_realtime_agent"
Otherwise -> "voice_realtime_model" MR never sees a mode picker. Mode is auto-selected. initialMode captured via useRef for degradation detection. Visual: No ModeSelector rendered in voice session. ModeStatusIndicator in header shows resolved mode label.

I-08: Fallback Chain (D-11, D-12)

Trigger: Avatar connection fails during session, or voice connection degrades. Behavior: Three-level fallback: Digital Human Realtime Agent -> Voice-only Realtime -> Text mode. Each fallback triggers:

toast.warning() notification via sonner with descriptive text from voice:error.avatarFallback or voice:error.voiceFallback
currentMode state updates to new degraded mode
ModeStatusIndicator updates automatically (dot turns amber when currentMode !== initialMode) Visual: Toast uses sonner warning styling (amber tint). ModeStatusIndicator badge text updates dynamically to reflect new mode label from voice:modeBadge.*.

I-09: Mode Status Indicator (D-12)

Trigger: Always visible during voice session in VoiceSessionHeader center. Behavior: Renders as <Badge variant="outline"> with prepended colored dot. Shows:

Mode label text from voice:modeBadge.{currentMode} i18n key (e.g., "Digital Human Agent", "Voice Agent", "Voice Realtime")
Status text from voice:modeStatus.* (Connected / Degraded / Disconnected)
Format: "{mode label} - {status text}"
Dot color logic: isDisconnected -> bg-destructive, isDegraded (currentMode !== initialMode) -> bg-amber-500, else -> bg-green-500 Visual: Badge with flex items-center gap-2 text-xs font-semibold. Dot is size-2 shrink-0 rounded-full. ARIA: role="status" aria-live="polite" for screen reader announcements.

I-10: Token Broker Per-HCP Wiring (D-08)

Trigger: MR starts a voice session for a scenario with an HCP. Behavior: Frontend passes hcpProfileId to useVoiceToken hook, which passes it to POST /api/v1/voice-live/token endpoint. Backend reads HCP profile voice/avatar settings and returns them in VoiceLiveTokenResponse (voice_name, avatar_character, avatar_style, avatar_customized, voice_temperature, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language). Falls back to global defaults when no HCP profile or on exception. useVoiceLive and useAvatarStream hooks consume per-HCP settings from token response. Visual: No visible UI change from user perspective. Settings are applied transparently -- MR sees the correct avatar character and hears the correct voice for each HCP.

I-12: Avatar Display in Voice Session (NEW)

Trigger: Voice session starts and tokenData.avatar_enabled === true. Behavior: Voice session page renders the unified avatar+chat layout (L-06). Avatar Display Area shows:

Azure AI Avatar video stream — <video> element connected to avatar WebSocket stream. Avatar lip-syncs with agent TTS output. Video auto-plays, muted (audio comes from TTS stream separately).
Static image fallback — If avatar video stream fails but avatar_character is configured, show a static avatar image (<img> from avatar character asset URL). ModeStatusIndicator shows amber "Degraded" state.
No avatar — If avatar_enabled === false or avatar_character not configured, render the standard voice-only layout without avatar area.

Chat panel on the right shows the real-time conversation transcript alongside the avatar. Both update simultaneously — user sees avatar speaking while reading the text.

Visual: Avatar centered in its container with neutral background. Smooth fade-in transition (transition-opacity duration-300) when avatar stream connects. Loading state shows skeleton pulse animation in the avatar area. Chat bubbles: AI messages use bg-card with left alignment, user messages use bg-primary/10 with right alignment.

Constraint: Avatar video must maintain aspect ratio (never stretch/distort). Use object-contain to fit within container bounds.

I-11: End Session with Flush (existing, unchanged)

Trigger: MR clicks End Session button. Behavior: Dialog confirmation. On confirm: flush pending transcripts via pendingFlushesRef with Promise.all, disconnect voice/avatar, call endSession API, navigate to scoring page. Visual: Existing Dialog pattern from Phase 08. No changes in Phase 12.

Copywriting Contract

All copy externalized via react-i18next. English (en-US) and Chinese (zh-CN) values verified against actual implementation.

Admin Namespace (`admin:hcp.*`)

Element	i18n Key	en-US Copy	zh-CN Copy
Tab: Profile	`admin:hcp.tabProfile`	Profile	基本信息
Tab: Voice & Avatar	`admin:hcp.tabVoiceAvatar`	Voice & Avatar	语音和数字人
Tab: Agent	`admin:hcp.tabAgent`	Agent	AI 代理
Voice section title	`admin:hcp.voiceSettings`	Voice Settings	语音设置
Avatar section title	`admin:hcp.avatarSettings`	Avatar Settings	数字人设置
Conversation params title	`admin:hcp.conversationParams`	Conversation Parameters	对话参数
Custom voice toggle	`admin:hcp.customVoice`	Custom voice	自定义语音
Custom avatar toggle	`admin:hcp.customAvatar`	Custom avatar	自定义数字人
Voice name label	`admin:hcp.voiceName`	Voice Name	语音名称
Avatar character label	`admin:hcp.avatarCharacter`	Avatar Character	数字人角色
Avatar style label	`admin:hcp.avatarStyle`	Avatar Style	数字人风格
Temperature label	`admin:hcp.temperature`	Temperature	对话温度
Turn detection label	`admin:hcp.turnDetection`	Turn Detection	轮次检测
Noise suppression label	`admin:hcp.noiseSuppression`	Noise Suppression	噪声抑制
Echo cancellation label	`admin:hcp.echoCancellation`	Echo Cancellation	回声消除
EOU detection label	`admin:hcp.eouDetection`	End-of-Utterance Detection	语音终止检测
Recognition language label	`admin:hcp.recognitionLanguage`	Recognition Language	识别语言
Auto detect option	`admin:hcp.autoDetect`	Auto Detect	自动检测
Agent instructions (auto)	`admin:hcp.autoInstructions`	Auto-generated Instructions	自动生成指令
Agent instructions (override)	`admin:hcp.overrideInstructions`	Override Instructions	自定义指令
Override placeholder	`admin:hcp.overridePlaceholder`	Leave empty to use auto-generated instructions	留空则使用自动生成的指令
Table column header	`admin:hcp.voiceAvatarCol`	Voice & Avatar	语音和数字人
Not configured text	`admin:hcp.notConfigured`	Not configured	未配置
Primary CTA	`admin:hcp.save`	Save Profile	保存配置
Empty state heading	`admin:hcp.emptyTitle`	No HCP Profiles	暂无 HCP 配置
Empty state body	`admin:hcp.emptyBody`	Create your first HCP profile to start building training scenarios.	创建第一个 HCP 配置以开始培训。
Delete with agent	`admin:hcp.deleteConfirmWithAgent`	Delete HCP Profile: This will permanently remove this profile, delete its AI Foundry agent, and unassign it from all scenarios. This action cannot be undone.	删除 HCP 配置：将永久删除此配置、其 AI Foundry Agent 以及所有关联场景分配。此操作不可撤销。
Delete with agent (short)	`admin:hcp.deleteConfirmAgent`	Delete this HCP profile? This will also delete the linked AI Foundry Agent.	确定删除此 HCP 配置？关联的 AI Foundry Agent 也将被删除。
Error: save failed	`admin:errors.hcpSaveFailed`	Failed to save HCP profile. Please try again.	保存 HCP 配置失败，请重试。

Voice Namespace (`voice:*`)

Element	i18n Key	en-US Copy	zh-CN Copy
Fallback toast: avatar	`voice:error.avatarFallback`	Avatar unavailable, switching to voice mode	数字人不可用，已切换为语音模式
Fallback toast: voice	`voice:error.voiceFallback`	Voice unavailable, switching to text mode	语音不可用，已切换为文字模式
Mode: connected	`voice:modeStatus.connected`	Connected	已连接
Mode: degraded	`voice:modeStatus.degraded`	Degraded	降级模式
Mode: disconnected	`voice:modeStatus.disconnected`	Disconnected	已断开
Mode badge: text	`voice:modeBadge.text`	Text Mode	文字模式
Mode badge: voice pipeline	`voice:modeBadge.voice_pipeline`	Voice Pipeline	语音管线
Mode badge: DH pipeline	`voice:modeBadge.digital_human_pipeline`	Digital Human Pipeline	数字人管线
Mode badge: voice RT model	`voice:modeBadge.voice_realtime_model`	Voice Realtime	语音实时
Mode badge: DH RT model	`voice:modeBadge.digital_human_realtime_model`	Digital Human Realtime	数字人实时
Mode badge: voice RT agent	`voice:modeBadge.voice_realtime_agent`	Voice Agent	语音代理
Mode badge: DH RT agent	`voice:modeBadge.digital_human_realtime_agent`	Digital Human Agent	数字人代理
Avatar loading	`voice:avatar.loading`	Connecting to avatar...	正在连接数字人...
Avatar failed	`voice:avatar.failed`	Avatar unavailable	数字人不可用
Transcript label	`voice:transcript`	Transcript	对话记录
Chat input placeholder	`voice:chatPlaceholder`	Type a message or use the mic...	输入消息或使用麦克风...

Layout Contracts

L-01: HCP Profile Editor (Tabbed, D-05)

Focal point: Save Profile button (top-right of header bar). Remains visible and accessible regardless of active tab.

+-----------------------------------------------------------+
| [<-] Create/Edit HCP Profile     [Test Chat] [Save]       |  <- Header bar (fixed, outside tabs)
+-----------------------------------------------------------+
| [Profile] [Voice & Avatar] [Agent]                         |  <- TabsList (h-9, bg-muted, rounded-lg)
+-----------------------------------------------------------+
|                                                            |
|  Tab content area (scrollable, max-w-4xl mx-auto)          |
|  Cards stacked vertically with space-y-6 (24px gap)        |
|                                                            |
+-----------------------------------------------------------+

Single <Form> wraps <Tabs>. Previous 3-column grid layout (2-col form + 1-col sidebar) replaced by full-width tabs. Agent status card and timestamps card moved into Agent tab content.

L-02: Voice & Avatar Tab Content

+-----------------------------------------------------------+
| Card: Voice Settings                                       |
|   Custom voice: [OFF ----]                     [Switch]    |
|   Voice Name: [Select dropdown  \/]                        |
|   (or Input if custom voice ON)                            |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Avatar Settings                                      |
|   Custom avatar: [OFF ----]                    [Switch]    |
|   Character: [Select \/]    Style: [Select \/]             |
|              (grid grid-cols-2 gap-4)                      |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Conversation Parameters                              |
|   Temperature: [=====O=====] 0.9      (Slider + value)    |
|   Turn Detection: [Select \/]                              |
|   Noise Suppression: [label]           [Switch]            |
|   Echo Cancellation: [label]           [Switch]            |
|   EOU Detection: [label]              [Switch]             |
|   Recognition Language: [Select \/]                        |
+-----------------------------------------------------------+

Each section is a Card. Fields within cards use space-y-4. Switch rows use flex items-center justify-between. Character and style dropdowns are grid grid-cols-2 gap-4.

L-03: Agent Tab Content

+-----------------------------------------------------------+
| Card: Agent Status (bg matches sync status config)         |
|   [Icon] Status: Synced / Pending / Failed / None          |
|   Agent ID: asst_xxxxx (Tooltip for full ID)               |
|   [Retry Sync button]  [View in Azure Portal link]         |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Agent Instructions                                   |
|   Auto-generated Instructions (label):                     |
|   [================================]                       |
|   [  You are Dr. Zhang, an...      ]  (disabled Textarea)  |
|   [================================]                       |
|                      space-y-4                             |
|   Override Instructions (label):                           |
|   [================================]                       |
|   [  (editable)                    ]  (active Textarea)    |
|   [================================]                       |
+-----------------------------------------------------------+
|                         space-y-6                          |
+-----------------------------------------------------------+
| Card: Metadata                                             |
|   Created: 2026-04-01 10:00                                |
|   Last Updated: 2026-04-02 14:30                           |
+-----------------------------------------------------------+

Agent status Card uses AGENT_STATUS_CONFIG for dynamic bg + border + icon + color per status value.

L-04: Voice Session Header with Mode Status (D-12)

Focal point: ModeStatusIndicator (center of header). Communicates live connection state.

+---[ Timer | Scenario Title ]---[ ModeStatusIndicator ]---[ ConnectionStatus | View | End ]---+
|  h-16 (64px)                                                                                  |

ModeStatusIndicator replaces the previous static Badge in center position. Format: [dot] {mode label} - {status}. Width auto-fits content. Dot size-2, text text-xs font-semibold, gap gap-2.

L-06: Voice Session with Avatar (Unified Page Layout)

Focal point: Avatar video/image (center-left). When HCP has avatar configured, avatar and agent conversation display on the same page.

Condition: Rendered when tokenData.avatar_enabled === true AND avatar connection is active. Falls back to L-04 voice-only layout when avatar is not configured or avatar connection fails.

+---[ Header: Timer | Scenario Title | ModeStatusIndicator | End ]---+  <- h-16
+--------------------------------------------------------------------+
|              |                              |                       |
|  Scenario    |    Avatar Display Area       |   Chat / Transcript   |
|  Panel       |    (center, flex-1)          |   Panel               |
|  (w-64,      |                              |   (w-[400px],         |
|   optional,  |  +----------------------+    |    flex flex-col)     |
|   collaps-   |  |                      |    |                       |
|   ible)      |  |  [Avatar Video/Img]  |    |   +---------------+   |
|              |  |  (aspect-[3/4] or    |    |   | Chat messages |   |
|              |  |   object-contain,    |    |   | (flex-1,      |   |
|              |  |   max-h-[70vh],      |    |   |  overflow-y-  |   |
|              |  |   mx-auto)           |    |   |  auto)        |   |
|              |  |                      |    |   +---------------+   |
|              |  +----------------------+    |   | [Input] [Mic] |   |
|              |                              |   +---------------+   |
+--------------------------------------------------------------------+

Avatar Display Area:

Container: flex items-center justify-center bg-neutral-50 dark:bg-neutral-900 rounded-lg overflow-hidden
Video element (when Azure AI Avatar streaming): <video> tag with autoPlay muted playsInline, sized to max-h-[70vh] w-auto mx-auto
Static image fallback (when avatar image configured but no video stream): <img> with object-contain max-h-[70vh] mx-auto
Empty state (avatar loading): Skeleton with pulsing animation, same aspect ratio
Background: subtle neutral to frame the avatar cleanly

Chat Panel (right side):

Shows real-time transcript messages (AI responses + user utterances)
Messages styled as chat bubbles: AI messages left-aligned (white/card bg), user messages right-aligned (primary/muted bg)
Text input at bottom with mic button for push-to-talk or toggle
Panel header: optional "Transcript" label or hidden for clean look

Interaction: Avatar lip-syncs or animates with agent speech. Chat transcript updates simultaneously with text-to-speech output. User can type or speak — both channels active.

Constraint: Avatar area must never overlap or obscure the chat panel. On narrow viewports, chat panel overlays avatar with semi-transparent background or stacks below.

L-05: HCP Table with Voice+Avatar Column (D-06)

| Name | Specialty | Personality | Comm Style | Agent Status | Voice & Avatar | Actions |
|------|-----------|-------------|------------|--------------|----------------|---------|
| Dr.Z | Oncology  | [friendly]  | 50 (Ind.)  | [Synced]     | [Ava][Lori-c]  | E R D   |
| Dr.L | Hematol.  | [skeptical] | 30 (Dir.)  | [Failed]     | Not configured  | E R D   |

New column positioned after Agent Status, before Actions. Column width: auto (content-driven). Badge pair uses gap-1 (4px). Both badges variant="outline" with text-xs.

State Management

State	Type	Location	Purpose
HCP form values (all tabs)	`useForm<HcpFormValues>` (react-hook-form + zod)	`hcp-profile-editor.tsx`	Single form instance across Profile / Voice & Avatar / Agent tabs. Zod schema includes all 13 voice/avatar fields. Prevents data loss on tab switch.
Active tab	Radix Tabs internal (`defaultValue="profile"`)	`hcp-profile-editor.tsx`	Uncontrolled. No external state needed.
Current session mode	`useState<SessionMode>`	`voice-session.tsx`	Auto-resolved from token broker via `resolveMode()`, updated on fallback chain trigger.
Initial session mode	`useRef<SessionMode>`	`voice-session.tsx`	Captured at session start. Used by ModeStatusIndicator to detect degradation (initial vs current).
Token broker response	TanStack Query mutation via `useVoiceToken`	`use-voice-token.ts`	Extended to pass `hcpProfileId`. Returns all per-HCP voice/avatar/conversation params in `VoiceLiveToken`.
Avatar style options	`useMemo` derived from selected character	`voice-avatar-tab.tsx`	Filters `AVATAR_VIDEO_CHARACTERS` styles when character selection changes.

Accessibility

Requirement	Implementation
Tab keyboard navigation	Radix Tabs handles Arrow key navigation, Enter/Space activation automatically
Form labels	All voice/avatar fields use FormLabel via react-hook-form FormField pattern. Switch labels use companion `<Label htmlFor>`
Switch ARIA	Each Switch has adjacent label text and `aria-checked` state (Radix default)
Mode status announcements	Badge uses `role="status"` and `aria-live="polite"` to announce mode changes to screen readers
Color not sole indicator	Mode status uses text label ("Connected" / "Degraded" / "Disconnected") alongside colored dot. Agent status uses icon + text label alongside background color
Fallback toast	sonner toasts include descriptive text, not just color. Warning level provides distinct styling
Tooltip for truncated content	Agent ID shown in Tooltip when truncated in Agent tab

Registry Safety

Registry	Blocks Used	Safety Gate
shadcn official	Not applicable (components already installed manually as Radix wrappers)	not required
Third-party	none	not applicable

No new component installations needed. All required UI primitives (Tabs, Select, Switch, Slider, Badge, Card, Form, Input, Textarea, Dialog, Tooltip, Button, Label) are already present in frontend/src/components/ui/.

Responsive Behavior

Breakpoint	HCP Editor	Voice Session (with Avatar)	Voice Session (no Avatar)	HCP Table
Desktop (>=1024px)	Tabs full-width, `max-w-4xl mx-auto`, all cards visible	3-panel: Scenario sidebar (w-64, collapsible) + Avatar center (flex-1) + Chat right (w-[400px]). Avatar video fills center with `max-h-[70vh]`	3-panel layout (existing ScenarioPanel + center + HintsPanel)	All 7 columns visible
Tablet (768-1023px)	Same as desktop, narrower content area	2-panel: Avatar top (50vh) + Chat bottom (50vh). Scenario sidebar hidden (accessible via hamburger). Avatar scales down proportionally	3-panel stacks vertically (existing `lg:flex-row` pattern)	Hide Comm Style column; Voice & Avatar badges stack vertically
Mobile (<768px)	Full-width tabs, cards stack, grid-cols-2 for avatar character/style collapses to grid-cols-1	Chat panel overlays avatar with semi-transparent bg + drag handle to resize. Or tab toggle: [Avatar] [Chat] tabs at bottom. Mic button always visible as floating action	Single panel with collapsible side panels (existing pattern)	Horizontal scroll on table, or hide Voice & Avatar and Comm Style columns

Checker Sign-Off

Dimension 1 Copywriting: PASS
Dimension 2 Visuals: PASS
Dimension 3 Color: PASS
Dimension 4 Typography: PASS (FLAG resolved — added text-xs to table)
Dimension 5 Spacing: PASS
Dimension 6 Registry Safety: PASS

Approval: APPROVED (2026-04-02)

Verification

Click to expand verification report

Phase 12: Voice Realtime API & Agent Mode Integration Verification Report

Phase Goal: Each HCP profile becomes a complete "digital persona" with per-HCP voice, avatar, and conversation parameters. The token broker returns all settings in one response. MRs get automatic mode selection (Digital Human Realtime Agent as default) with graceful fallback to voice-only or text. Admin configures HCP digital personas via a tabbed editor.

Verified: 2026-04-02T14:15:00Z Status: passed Re-verification: Yes -- after gap closure (commit 8126313 fixed voice-session.test.tsx)

Goal Achievement

Observable Truths

#	Truth	Status	Evidence
1	Admin can configure per-HCP voice settings, avatar settings, and conversation parameters via tabbed HCP editor	VERIFIED	`voice-avatar-tab.tsx` (438 lines): 3 Cards (Voice Settings, Avatar Settings, Conversation Parameters) with Select dropdowns for voice name (8 options), avatar character (6 options) with dynamic style filtering, temperature Slider, 3 Switch controls (noise suppression, echo cancellation, EOU detection), turn detection Select, recognition language Select. `hcp-profile-editor.tsx` imports and renders VoiceAvatarTab in TabsContent.
2	Token broker returns all per-HCP voice/avatar settings when hcp_profile_id is provided, falls back to global defaults when not	VERIFIED	`voice_live_service.py` lines 82-106: sources all 13 fields from `profile.voice_name`, `profile.avatar_character`, etc. when hcp_profile_id provided. Lines 65-79: initializes defaults before the if-block. Lines 108-130: returns all fields in VoiceLiveTokenResponse.
3	New HCPs get smart defaults (voice "Ava", avatar "Lori-casual", temp 0.9, Server VAD) without manual configuration	VERIFIED	`hcp_profile.py` model defaults: `voice_name="en-US-AvaNeural"`, `avatar_character="lori"`, `avatar_style="casual"`, `voice_temperature=0.9`, `turn_detection_type="server_vad"`. Migration `i12b` has matching `server_default` on all 13 columns.
4	MR does NOT see a mode picker -- system auto-selects best mode based on HCP config and service availability	VERIFIED	`voice-session.tsx`: `resolveMode(tokenData)` function at line 49 derives mode from `avatar_enabled` and `agent_id`. No ModeSelector import or render found. Props interface uses `hcpProfileId: string`, not `mode: SessionMode`.
5	Fallback chain works: Digital Human Realtime Agent -> Voice-only Realtime -> Text, with toast notification and persistent mode status indicator	VERIFIED	`voice-session.tsx`: avatar connect failure triggers `toast.warning(t("error.avatarFallback"))` (line 193) and falls back to voice-only. Voice connection failure triggers `toast.warning(t("error.voiceFallback"))` (lines 142, 210) and falls back to text. `mode-status-indicator.tsx`: green/amber/red dot with `role="status"` and `aria-live="polite"`.
6	HCP table shows Voice & Avatar column with badge pair showing per-HCP configuration	VERIFIED	`hcp-table.tsx`: column header `t("hcp.voiceAvatarCol")` at line 181. Cell renders two Badge elements with `getVoiceLabel(profile.voice_name)` and `profile.avatar_character-profile.avatar_style`.
7	Agent instructions support admin override via Agent tab (D-02)	VERIFIED	`agent-tab.tsx`: disabled Textarea showing `buildPreviewInstructions()` auto-generated preview, editable Textarea for `agent_instructions_override` with i18n placeholder. Backend `agent_sync_service.py`: checks override first, returns trimmed text if non-empty. 5 dedicated override tests pass.
8	All new UI text externalized to i18n in both en-US and zh-CN	VERIFIED	`admin.json` (en-US): 21+ keys including tabProfile, tabVoiceAvatar, tabAgent, voiceSettings, avatarSettings, voiceAvatarCol, notConfigured. `admin.json` (zh-CN): matching keys with Chinese translations. `voice.json` (en-US): modeStatus.connected/degraded/disconnected, error.avatarFallback/voiceFallback. `voice.json` (zh-CN): matching Chinese translations.

Score: 8/8 truths verified

Required Artifacts

Artifact	Expected	Status	Details
`backend/alembic/versions/i12b_add_voice_avatar_fields_to_hcp_profile.py`	Migration adding 13 columns	VERIFIED	13 add_column calls with server_default on all, batch_alter_table for SQLite compat
`backend/app/models/hcp_profile.py`	ORM model with voice/avatar columns	VERIFIED	13 new Mapped columns (voice_name, voice_type, voice_temperature, voice_custom, avatar_character, avatar_style, avatar_customized, turn_detection_type, noise_suppression, echo_cancellation, eou_detection, recognition_language, agent_instructions_override)
`backend/app/schemas/hcp_profile.py`	Extended Pydantic schemas	VERIFIED	HcpProfileCreate, HcpProfileUpdate, HcpProfileResponse all include 13 voice/avatar fields
`backend/app/schemas/voice_live.py`	VoiceLiveTokenResponse with per-HCP fields	VERIFIED	11 per-HCP fields added
`backend/app/services/voice_live_service.py`	Token broker with per-HCP sourcing	VERIFIED	Sources all fields from profile when hcp_profile_id provided, falls back to defaults
`backend/app/api/voice_live.py`	Endpoint with hcp_profile_id query param	VERIFIED	`hcp_profile_id: str
`backend/app/services/agent_sync_service.py`	Agent instructions override (D-02)	VERIFIED	`build_agent_instructions` checks override first, returns trimmed text if non-empty
`backend/app/api/hcp_profiles.py`	HcpProfileOut with voice/avatar fields	VERIFIED	13 voice/avatar fields added to HcpProfileOut response model
`frontend/src/types/hcp.ts`	Extended TypeScript types	VERIFIED	HcpProfile has 13 voice/avatar fields, HcpProfileCreate has all optional
`frontend/src/types/voice-live.ts`	VoiceLiveToken with per-HCP fields	VERIFIED	11 per-HCP optional fields added
`frontend/src/api/voice-live.ts`	API client with hcpProfileId	VERIFIED	`fetchVoiceLiveToken(hcpProfileId?: string)` passes as query param
`frontend/src/hooks/use-voice-token.ts`	Mutation accepts hcpProfileId	VERIFIED	`useMutation<VoiceLiveToken, Error, string
`frontend/src/components/admin/voice-avatar-tab.tsx`	Voice & Avatar tab component	VERIFIED	438 lines, 3 Cards, all form fields wired to react-hook-form
`frontend/src/components/admin/agent-tab.tsx`	Agent tab component	VERIFIED	281 lines, AGENT_STATUS_CONFIG, preview + override textareas, metadata card
`frontend/src/pages/admin/hcp-profile-editor.tsx`	Tabbed HCP editor	VERIFIED	3 TabsTrigger values (profile, voice-avatar, agent), imports VoiceAvatarTab + AgentTab
`frontend/src/components/admin/hcp-table.tsx`	HCP table with Voice+Avatar column	VERIFIED	voiceAvatarCol header, Badge pair display
`frontend/src/components/voice/mode-status-indicator.tsx`	Mode status badge	VERIFIED	Green/amber/red dot, i18n labels, role="status", aria-live="polite"
`frontend/src/components/voice/voice-session.tsx`	Auto-mode + fallback chain	VERIFIED	resolveMode function, hcpProfileId prop (no mode prop), fallback with toast warnings
`frontend/src/components/voice/voice-session-header.tsx`	Header with ModeStatusIndicator	VERIFIED	currentMode/initialMode props, ModeStatusIndicator rendered
`frontend/src/hooks/use-voice-live.ts`	Per-HCP session config	VERIFIED	Uses tokenData.voice_temperature, turn_detection_type, noise_suppression, avatar_style
`frontend/src/pages/user/voice-session.tsx`	Page passes hcpProfileId	VERIFIED	`hcpProfileId={hcpProfileId}` from scenario
`backend/tests/test_voice_live_per_hcp.py`	Per-HCP token broker tests	VERIFIED	8 tests passing
`backend/tests/test_hcp_profile_voice.py`	HCP CRUD voice field tests	VERIFIED	10 tests passing
`backend/tests/test_agent_sync_service.py`	Agent instruction override tests	VERIFIED	5 new override tests passing (27 total in file)
`backend/scripts/seed_phase2.py`	Seed data with voice/avatar configs	VERIFIED	5 HCP profiles with distinct voice_name and avatar_character values
`frontend/src/components/voice/voice-session.test.tsx`	Updated test for new props	VERIFIED	Uses `hcpProfileId: "hcp-1"` prop (line 277). No stale `mode` prop references. `tsc -b` passes cleanly with 0 errors.

Key Link Verification

From	To	Via	Status	Details
`voice_live.py` (API)	`voice_live_service.py`	`hcp_profile_id` pass-through	WIRED	`hcp_profile_id=hcp_profile_id`
`voice_live_service.py`	`hcp_profile.py` (model)	Lazy import hcp_profile_service	WIRED	`from app.services import hcp_profile_service; profile = await hcp_profile_service.get_hcp_profile(db, hcp_profile_id)`
`hcp-profile-editor.tsx`	`voice-avatar-tab.tsx`	Import and render in TabsContent	WIRED	Import + `<VoiceAvatarTab form={form} />`
`hcp-profile-editor.tsx`	`agent-tab.tsx`	Import and render in TabsContent	WIRED	Import + `<AgentTab ...>`
`voice-live.ts` (API)	Backend POST /voice-live/token	hcp_profile_id query param	WIRED	`params = hcpProfileId ? { hcp_profile_id: hcpProfileId } : {}`
`voice-session-page.tsx`	`voice-session.tsx`	hcpProfileId prop	WIRED	`hcpProfileId={hcpProfileId}`
`voice-session.tsx`	`use-voice-token.ts`	`mutateAsync(hcpProfileId)`	WIRED	`tokenMutation.mutateAsync(hcpProfileId)`
`use-voice-live.ts`	VoiceLiveToken per-HCP fields	Session config from tokenData	WIRED	`tokenData.voice_temperature`, `tokenData.turn_detection_type`, `tokenData.noise_suppression`, `tokenData.avatar_style` confirmed

Data-Flow Trace (Level 4)

Artifact	Data Variable	Source	Produces Real Data	Status
`voice-avatar-tab.tsx`	form (UseFormReturn)	Parent `hcp-profile-editor.tsx` react-hook-form	Yes - populated from HCP profile API response via useQuery	FLOWING
`agent-tab.tsx`	form + profile	Parent form + useQuery HCP profile	Yes - profile from API, form from react-hook-form	FLOWING
`mode-status-indicator.tsx`	currentMode, initialMode, connectionState	Props from voice-session.tsx state	Yes - derived from token broker response via resolveMode()	FLOWING
`hcp-table.tsx`	profile.voice_name, avatar_character	HCP profiles from useHcpProfiles query	Yes - DB-backed via API	FLOWING
`voice-session.tsx`	tokenData	tokenMutation.mutateAsync(hcpProfileId)	Yes - token broker API call	FLOWING

Behavioral Spot-Checks

Behavior	Command	Result	Status
Frontend tsc -b (gap fix)	`npx tsc -b --noEmit`	0 errors, clean exit	PASS
Frontend Vite build	`npm run build`	Built in 4.46s, dist/ output generated	PASS
Backend tests (45 total)	`pytest tests/test_voice_live_per_hcp.py tests/test_hcp_profile_voice.py tests/test_agent_sync_service.py -x -v`	45 passed in 34.50s	PASS
Test file uses hcpProfileId prop	grep for `hcpProfileId` in test	Line 277: `hcpProfileId: "hcp-1"`	PASS
Test file has no stale mode prop	grep for `mode:` in test	Only `mode: "f2f"` in mockScenarioData (Scenario type, not VoiceSessionProps)	PASS

Requirements Coverage

Requirement	Source Plan	Description	Status	Evidence
VOICE-12-01	12-01	Per-HCP digital persona model (voice/avatar columns)	SATISFIED	13 columns on HcpProfile model with ORM + Pydantic + migration
VOICE-12-02	12-01	Token broker per-HCP wiring	SATISFIED	voice_live_service sources all fields from HCP profile
VOICE-12-03	12-02	Admin tabbed HCP editor with Voice & Avatar tab	SATISFIED	3-tab layout with VoiceAvatarTab and AgentTab components
VOICE-12-04	12-03	Auto-mode resolution (no manual mode picker)	SATISFIED	resolveMode() function, hcpProfileId prop replaces mode
VOICE-12-05	12-02	HCP table Voice+Avatar column, i18n	SATISFIED	Badge pair display, 21+ i18n keys in both locales
VOICE-12-06	12-03	Fallback chain with toast notifications and ModeStatusIndicator	SATISFIED	3-level fallback with toast.warning, green/amber/red indicator

Note: VOICE-12-01 through VOICE-12-06 are referenced in ROADMAP.md but NOT formally defined in REQUIREMENTS.md. They are phase-specific IDs created for Phase 12. No orphaned requirements exist -- REQUIREMENTS.md maps no additional IDs to Phase 12.

Anti-Patterns Found

File	Line	Pattern	Severity	Impact
`voice_live_service.py`	105-106	`except Exception: pass` (silent fallback)	Info	Intentional design: falls back to defaults when HCP profile lookup fails. Prevents service outage from profile issues.

Human Verification Required

1. HCP Editor Tab Navigation

Test: Open HCP editor, fill in Profile tab fields, switch to Voice & Avatar tab, configure voice/avatar settings, switch to Agent tab, verify override textarea works, switch back to Profile tab. Expected: All form data persists across tab switches. No data loss. Why human: Cross-tab form state persistence requires interactive browser testing.

2. Avatar Character-Style Dynamic Filtering

Test: In Voice & Avatar tab, change avatar character dropdown from "lori" to "lisa". Check if style dropdown options update dynamically. Expected: Style options change to lisa-specific styles (casual-sitting, graceful-sitting, etc.). Previously selected style resets to first available. Why human: Dynamic dropdown filtering requires visual interaction.

3. ModeStatusIndicator Visual States

Test: Start a voice session, observe the ModeStatusIndicator badge color and text during connection, degradation (if simulated), and disconnection. Expected: Green dot + "Connected" when at optimal mode, amber dot + "Degraded" when fallen back, red dot + "Disconnected" on error. Why human: Real-time visual state changes during live WebSocket/Avatar connections.

4. Fallback Chain Toast Notifications

Test: Start a voice session where avatar service is unavailable but voice works. Then start one where voice is also unavailable. Expected: First scenario: toast warning "Avatar unavailable, switching to voice mode". Second: toast warning "Voice unavailable, switching to text mode". Why human: Requires simulating service unavailability with real Azure connections.

5. HCP Table Voice & Avatar Column

Test: View HCP list page with multiple profiles that have different voice/avatar configurations. Expected: Badge pairs show short voice label (e.g., "Ava", "Yunxi") and avatar character-style (e.g., "lori-casual"). Profiles without config show "Not configured". Why human: Visual layout, badge rendering, and label formatting need visual confirmation.

Re-verification: Gap Closure Details

Previous gap: voice-session.test.tsx referenced the removed mode prop from pre-Phase 12-03 VoiceSessionProps interface, producing 12 TypeScript TS2353 errors. tsc -b failed across the full frontend project.

Fix: Commit 8126313 ("fix(12): update voice-session.test.tsx for auto-mode props (mode -> hcpProfileId)") updated the test file to:

Replace mode: "voice_pipeline" prop with hcpProfileId: "hcp-1" in defaultProps (line 277)
Update mock VoiceSessionHeader to check the new props pattern
Remove all references to the removed mode prop on VoiceSessionProps

Verification of fix:

npx tsc -b --noEmit now completes with 0 errors
grep confirms no stale mode prop references in test (only mode: "f2f" in mockScenarioData which is the Scenario type field, not VoiceSessionProps)
npm run build succeeds in 4.46s

Regression check: All 8 previously-verified truths remain verified. All artifacts remain present and substantive. No regressions detected.

Verified: 2026-04-02T14:15:00Z Verifier: Claude (gsd-verifier)

Planning Phase 12 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 12: Voice Realtime Api Agent

Context & Decisions

Phase 12: Voice Realtime API & Agent Mode Integration - Context

HCP Voice/Avatar Configuration Scope

Admin UX — HCP Editor Redesign

Session Wiring

Mode Simplification & Fallback

Claude's Discretion

Canonical References

Reference Implementation

HCP Profile Model & API (Phase 11 output)

Voice Live Infrastructure (Phase 08/09 output)

Frontend Voice Components (Phase 08 output)

Frontend Admin (Phase 11 output)

Config & Auth

Existing Code Insights

Reusable Assets

Established Patterns

Integration Points

Plans (4)

Research

Phase 12: Voice Realtime API & Agent Mode Integration - Research

Summary

User Constraints (from CONTEXT.md)

Locked Decisions

Claude's Discretion

Deferred Ideas (OUT OF SCOPE)

Standard Stack

Core

Supporting

Alternatives Considered

Architecture Patterns

Recommended Project Structure

Pattern 1: Per-HCP Token Broker Extension

Pattern 2: Auto-Mode with Fallback Chain (D-10, D-11)

Pattern 3: Tabbed HCP Editor (D-05)

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: SQLite batch_alter_table Required for Adding Columns

Pitfall 2: Token Broker Must Pass hcp_profile_id from Frontend

Pitfall 3: Avatar Character vs Style are Separate Fields

Pitfall 4: Form Reset on Tab Switch Loses Unsaved Changes

Pitfall 5: Lazy Import for hcp_profile_service in voice_live_service

Pitfall 6: Avatar Session Config Structure Must Match Azure API

Code Examples

Azure Voice Live Session Config with Per-HCP Settings

Azure Standard Video Avatar Characters (for dropdown)

HCP Profile Voice/Avatar DB Columns

Extended VoiceLiveTokenResponse Schema

Turn Detection Types (for dropdown)

Voice Name Options (common Azure TTS voices)

State of the Art

Open Questions

Project Constraints (from CLAUDE.md)

Coding Standards

Database Rules

Pre-Commit Checklist

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

UI Specification

Phase 12 -- UI Design Contract

Design System

Spacing Scale

Typography

Color

Focal Points

Component Inventory

New Components

Modified Components

Reused Components (no changes)

Interaction Contracts

I-01: HCP Editor Tab Navigation (D-05)

I-02: Voice Name Selection (D-01, D-04)

I-03: Avatar Character + Style Selection (D-03)

I-04: Conversation Parameters (D-01)

Admin Namespace (`admin:hcp.*`)

Voice Namespace (`voice:*`)

⚠️ GitHub.com Fallback ⚠️