Phase 09: Integration Testing With Real Azure Services

Auto-generated from .planning/phases/09-integration-testing-with-real-azure-services
Last synced: 2026-04-02

Context & Decisions

Phase 09: Integration Testing with Real Azure Services - Context

Gathered: 2026-03-28 (updated — revised scope to include unimplemented config alignment) Status: Ready for planning

## Phase Boundary

Implement the remaining Azure config alignment work (unified AI Foundry endpoint, 7 interaction modes, agent mode runtime) that was planned for Phase 07/08 but not implemented, then validate all Azure service integrations end-to-end with real credentials and polish the demo experience for BeiGene customer presentations.

Implementation scope (config alignment):

Unified AI Foundry config: single endpoint replacing 8 separate ServiceConfig rows
Expand from 3 modes (text/voice/avatar) to 7 interaction modes
Wire agent mode end-to-end (token broker, frontend hook, WebSocket path)
Admin UI redesign: single AI Foundry card with per-service toggles

Testing scope:

Pytest integration tests per Azure service with real credentials
Playwright E2E tests for full demo flow
Performance validation and manual smoke test checklist

Out of scope:

Azure AD token auth (DefaultAzureCredential) — deferred to future phase
Fallback chain (7→1 based on service availability) — deferred
New features or capabilities
CI/CD integration of Azure tests (local-only execution)

## Implementation Decisions

Unified AI Foundry Config

D-01: Single master ServiceConfig row with AI Foundry endpoint, region, and API key. Per-service rows become enable/disable toggles with service-specific fields (model/deployment name). Replaces current 8 separate rows with own endpoints/keys.
D-02: ServiceConfig schema changes: add ai_foundry_endpoint, ai_foundry_region, api_key_encrypted (master). Per-service rows keep service_name, enabled, model_or_deployment. Remove per-service endpoint/key fields.
D-03: API key auth only for now. Azure AD token auth (DefaultAzureCredential) deferred to a future phase. Current resource may have disableLocalAuth: true — tester needs to enable API key auth or defer AD auth support.

Admin UI — Single AI Foundry Card

D-04: One "Azure AI Foundry" config card with endpoint URL, region, API key fields. Below it: toggle list for each service (enable/disable) with service-specific fields like model/deployment name.
D-05: Replaces current 8 separate ServiceConfigCard components in azure-config.tsx.

Seven Interaction Modes

D-06: Platform supports 7 interaction modes, admin-configurable:
1. Text — Text-only coaching (Azure OpenAI chat completions REST API)
2. Voice Pipeline — Voice without avatar (Azure Speech STT → Azure OpenAI chat → Azure Speech TTS)
3. Digital Human: Speech+Model — Avatar with pipeline voice (STT → LLM → TTS → Avatar rendering)
4. Voice Realtime Model — Voice without avatar via voice-live/realtime WebSocket (all-in-one realtime)
5. Digital Human: Realtime Model — Avatar with voice-live/realtime WebSocket + Avatar rendering
6. Voice Realtime Agent — Voice without avatar via voice-agent/realtime WebSocket (function calling enabled)
7. Digital Human: Realtime Agent — Avatar with voice-agent/realtime WebSocket + Avatar rendering (function calling enabled)
D-07: Session mode schema expands from Literal["text", "voice", "avatar"] to full 7-mode enum. Alembic migration required.

Mode Selector UI — Two-Level

D-08: Two-level selector: first pick communication type (Text, Voice-only, Digital Human), then pick engine (Pipeline, Realtime Model, Realtime Agent). Clearer for non-technical MR users.
D-09: Admin-configured default mode. Modes only shown if their required services are enabled in AI Foundry config.

Agent Mode Runtime

D-10: Token broker reads agent mode from ServiceConfig (via parse_voice_live_mode()), returns agent_id + project_name in VoiceLiveTokenResponse when agent mode is selected.
D-11: Frontend use-voice-live.ts hook uses voice-agent/realtime WebSocket path when agent mode, voice-live/realtime when model mode. Conditional connection logic based on token response.

Test Scope & Strategy

D-12: Implementation first, test after. Build unified config + 7 modes + agent runtime, then write integration tests and E2E tests to validate.
D-13: Two-layer test approach: Pytest integration tests per Azure service + Playwright E2E tests for full demo flow.
D-14: Pytest tests: one test module per service, all using the unified AI Foundry endpoint. Use @pytest.mark.integration with --run-integration CLI flag.
D-15: Playwright E2E tests: exercise the complete demo scenario from login → admin AI Foundry config → text/voice/avatar → scoring.
D-16: Manual smoke test checklist documented for pre-demo preparation.

Acceptance Criteria

D-17: AI response latency < 3 seconds for smooth conversation flow.
D-18: Avatar renders smoothly — lip-sync matches speech, no freezing.
D-19: Post-session scoring report generates correctly for all session modes.
D-20: Full pipeline demo works: Login → Admin configures AI Foundry → Text session → Switch to voice → Switch to avatar → Score report.

Claude's Discretion

Exact Playwright test structure and page object patterns
Performance measurement implementation
Test data fixtures and seed data
Skip markers for offline development
Smoke test checklist format
Alembic migration details for schema changes
How to structure plans (config alignment vs testing can be separate plans)

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Azure AI Foundry Resource

Resource: ai-foundary-qiah-east-us2 (kind: AIServices, region: eastus2)
Unified endpoint: https://ai-foundary-qiah-east-us2.cognitiveservices.azure.com/
Endpoints confirmed: OpenAI Realtime, Voice Agent Realtime, Voice Live Realtime, Speech STT/TTS, Avatar, Content Understanding

Current Config Architecture (Phase 07)

backend/app/services/config_service.py — Config CRUD with Fernet encryption
backend/app/api/azure_config.py — Admin REST API + dynamic adapter registration
backend/app/models/service_config.py — ServiceConfig ORM model (needs schema changes for unified config)
backend/app/config.py — Settings class with env vars
backend/app/utils/encryption.py — Fernet encrypt/decrypt
backend/app/services/connection_tester.py — Service-specific connection tests
backend/app/main.py — Startup lifespan with 2-phase adapter loading

Azure Service Adapters

backend/app/services/agents/adapters/azure_openai.py — Azure OpenAI LLM adapter
backend/app/services/agents/stt/azure.py — Azure Speech STT
backend/app/services/agents/tts/azure.py — Azure Speech TTS
backend/app/services/agents/avatar/azure.py — Azure Avatar adapter (stub)
backend/app/services/agents/adapters/azure_voice_live.py — Agent/Model mode parse/encode functions
backend/app/services/agents/registry.py — ServiceRegistry singleton

Voice Live & Avatar (Phase 08)

backend/app/services/voice_live_service.py — Token broker (needs agent mode + unified config)
backend/app/api/voice_live.py — Voice Live API routes
frontend/src/hooks/use-voice-live.ts — RTClient connection (needs Agent mode path)
frontend/src/hooks/use-avatar-stream.ts — WebRTC avatar stream

Admin Configuration UI

frontend/src/pages/admin/azure-config.tsx — Needs full redesign for single AI Foundry card
frontend/src/components/admin/service-config-card.tsx — Has agent mode toggle (partially done)
frontend/src/api/azure-config.ts — Frontend API client
frontend/src/types/azure-config.ts — TypeScript types (has VoiceLiveAgentConfig)

Mode Selection

frontend/src/components/voice/mode-selector.tsx — Current 3-mode selector (needs 7-mode two-level redesign)
frontend/src/types/voice-live.ts — SessionMode type (needs expansion)
backend/app/schemas/session.py — Session mode Literal (needs expansion)
backend/app/models/session.py — Session model mode field

Existing Tests

backend/tests/ — Existing test patterns and conftest.py
frontend/e2e/ — Existing Playwright E2E patterns

Project Requirements

docs/requirements.md — COACH-04, COACH-05, COACH-06, COACH-07, PLAT-03, PLAT-05

Reference Repository

User's Voice-Live-Agent-With-Avadar repo — Reference for Agent mode + Model mode pattern

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

Agent mode parse/encode: backend/app/services/agents/adapters/azure_voice_live.py — parse_voice_live_mode() and encode_voice_live_mode() already handle agent config serialization
Agent mode admin UI: frontend/src/components/admin/service-config-card.tsx — Agent/Model toggle with agent_id/project_name fields already implemented
Agent mode types: frontend/src/types/azure-config.ts — VoiceLiveAgentConfig and VoiceLiveModelConfig types exist
Region capabilities: backend/app/services/region_capabilities.py — VOICE_LIVE_AGENT_REGIONS defined
Connection tester: backend/app/services/connection_tester.py — needs adaptation for unified endpoint
Mock adapters: All services have mock adapters — baseline comparison for real behavior
Conftest fixtures: backend/tests/conftest.py — async test client, database fixtures, auth helpers
Playwright config: frontend/playwright.config.ts — existing E2E infrastructure

Established Patterns

Pytest-asyncio: All backend tests use async patterns with httpx AsyncClient
Adapter pattern: BaseAdapter → MockAdapter/RealAdapter pattern per service
Config service: CRUD with Fernet encryption for sensitive values
TanStack Query hooks: Per-domain hooks in frontend/src/hooks/

Integration Points

ServiceConfig model: Schema change for unified AI Foundry (Alembic migration)
register_adapter_from_config(): Must accept unified config and instantiate all adapters from single endpoint
Voice Live token broker: Must pass agent config and support unified endpoint
Frontend mode selector: Two-level UI replacing current 3-button selector
Session schema/model: Mode enum expansion from 3 to 7 values

Key Gaps (from codebase audit)

No Agent mode runtime — admin can store config, but token broker and frontend ignore it
Raw API key exposed to browser via token broker (no token-based approach)
STT/TTS configured as separate services with duplicate keys
Frontend SERVICE_KEY_MAP lists services not recognized by backend
register_adapter_from_config ignores azure_voice_live
Azure Avatar adapter is a stub (is_available() always False)
Session mode limited to 3 values (text/voice/avatar)

</code_context>

## Specific Ideas

Use single AI Foundry service approach — users shouldn't configure so many things (用户不用配置那么多东西，配置就简单很多)
Reference Voice-Live-Agent-With-Avadar repo for Agent mode + Model mode pattern
This is the main demo content for BeiGene — must be polished and reliable
Demo should showcase full pipeline: text → voice → avatar with smooth transitions
Performance matters — response latency must feel conversational (< 3 seconds)
Avatar must render without glitches — "wow" factor for demo
Two-level mode selector: communication type first, then engine — clearer for non-technical MR users
需要好好测试性能，UI美观，效果，便利性等

## Deferred Ideas

Azure AD token auth (DefaultAzureCredential) — Requires azure-identity SDK, more complex auth flow. Implement when production deployment requires it.
Fallback chain (7→6→5→4→3→2→1) — Graceful mode degradation based on service availability. Good for production resilience but not needed for controlled demo environment.
CI/CD integration of Azure tests — Avoid Azure costs in CI pipeline. Local-only execution for now.
Azure cost optimization — Future phase concern.

Phase: 09-integration-testing-with-real-azure-services Context gathered: 2026-03-28 (updated — config alignment + testing scope)

Plans (5)

#	Plan File	Status
09-01	09-01-PLAN.md	Complete
09-02	09-02-PLAN.md	Complete
09-03	09-03-PLAN.md	Complete
09-04	09-04-PLAN.md	Complete
09-05	09-05-PLAN.md	Complete

Research

Click to expand research notes

Phase 09: Integration Testing with Real Azure Services - Research

Researched: 2026-03-27 Domain: Integration testing, Azure service validation, E2E demo testing, performance measurement Confidence: HIGH

Summary

Phase 9 is fundamentally different from prior phases: it builds no new features. Instead, it validates that the Azure services integrated in Phases 7 and 8 work correctly end-to-end with real credentials, polishes the demo experience for BeiGene customer presentations, and creates automated + manual test suites for ongoing validation. The codebase already has a comprehensive mock-based test suite (pytest with 60+ test files, Playwright with 30+ spec files) and all Azure adapter implementations are in place.

The testing strategy has two layers: (1) pytest integration tests that bypass mocks and call real Azure APIs (Azure OpenAI chat completion, Azure Speech STT/TTS, Voice Live token broker, Avatar config validation), and (2) Playwright E2E tests that exercise the full browser demo flow from admin config through coaching session modes and scoring. A third layer is a manual smoke test checklist for pre-demo preparation.

Primary recommendation: Use pytest markers (@pytest.mark.integration with --run-integration CLI flag) to separate Azure integration tests from the fast mock-based unit test suite. Playwright E2E tests for the demo flow should run against a live backend with real Azure credentials loaded via the admin UI. Performance assertions (< 3s latency) go directly in the integration test assertions. Do NOT modify existing mock tests or adapters; this phase is purely additive.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

D-01: Equal focus on technical validation AND demo polish -- this is the main content for customer demos
D-02: Test the full pipeline demo flow: admin configures Azure services -> user starts text session -> switches to voice-only -> switches to avatar mode, with scoring working on all modes
D-03: All Azure services tested: Azure OpenAI (LLM), Azure Speech (STT/TTS), Azure Voice Live API, Azure AI Avatar
D-04: Use existing pattern from Phase 7 -- credentials in backend .env for dev, configured via Admin Azure Config page for demo
D-05: No separate test config profiles -- use the same admin UI configuration flow that the demo itself uses
D-06: Tests are local + manual only -- no CI/CD integration with Azure credentials (avoids Azure costs)
D-07: Two-layer test approach: Pytest integration tests per Azure service + Playwright E2E tests for full demo flow
D-08: Pytest tests: one test module per Azure service adapter (test_azure_openai_integration.py, test_azure_speech_integration.py, test_voice_live_integration.py, test_avatar_integration.py)
D-09: Playwright E2E tests: exercise the complete demo scenario from login -> admin config -> start session -> text/voice/avatar interaction -> scoring report
D-10: Manual smoke test checklist documented for pre-demo preparation
D-11: AI response latency < 3 seconds for smooth conversation flow (text and voice modes)
D-12: Avatar renders smoothly -- lip-sync matches speech, no freezing or glitches
D-13: Graceful fallback chain works: Avatar unavailable -> voice-only with waveform, Voice unavailable -> text-only. Clear user feedback at each transition.
D-14: Post-session scoring report generates correctly for text, voice, and avatar sessions -- same dimensions, same quality

Claude's Discretion

Exact Playwright test structure and page object patterns
Performance measurement implementation (timing hooks, metrics collection)
Test data fixtures and seed data for integration tests
Skip markers for offline development (pytest markers to skip Azure tests without credentials)
Smoke test checklist format and detail level

Deferred Ideas (OUT OF SCOPE)

None -- discussion stayed within phase scope </user_constraints>

Standard Stack

Core

Library	Version	Purpose	Why Standard
pytest	8.3.3	Backend integration test framework	Already installed and configured in pyproject.toml
pytest-asyncio	0.24.0+	Async test support	Already configured with asyncio_mode = "auto"
pytest-timeout	2.2.0+	Test timeout enforcement (60s default)	Already installed; integration tests need longer timeouts
Playwright	1.58.2	E2E browser testing	Already installed with auth setup and 30+ specs
httpx	0.27.0+	Async HTTP client for integration tests	Already a project dependency; used for real HTTP calls
openai	1.50.0+	Azure OpenAI SDK for real API calls	Already installed; integration tests import directly

Supporting

Library	Version	Purpose	When to Use
time (stdlib)	-	Performance timing in tests	Measure API response latency
os (stdlib)	-	Environment variable reading for credentials	Skip tests when credentials missing

Alternatives Considered

Instead of	Could Use	Tradeoff
Custom timing framework	pytest-benchmark	Overkill for simple latency checks; stdlib time.perf_counter() is sufficient
Page Object Model classes	Inline selectors	POM adds structure for the complex demo flow; worth it for E2E readability

No new installations needed. All required packages are already in pyproject.toml.

Architecture Patterns

Recommended Test File Structure

backend/tests/
├── conftest.py                          # Existing: in-memory SQLite fixtures
├── integration/
│   ├── conftest.py                      # Integration-specific: real DB, skip markers, credentials
│   ├── test_azure_openai_integration.py # D-08: Azure OpenAI streaming + latency
│   ├── test_azure_speech_integration.py # D-08: STT transcription + TTS synthesis
│   ├── test_voice_live_integration.py   # D-08: Voice Live token + endpoint reachability
│   └── test_avatar_integration.py       # D-08: Avatar config validation
├── ... (existing test files unchanged)

frontend/e2e/
├── playwright.config.ts                 # Existing: webServer config
├── auth.setup.ts                        # Existing: user + admin auth
├── coverage-helper.ts                   # Existing: Istanbul coverage
├── demo-flow.spec.ts                    # D-09: Full demo pipeline E2E test
├── ... (existing spec files unchanged)

docs/
├── SMOKE_TEST_CHECKLIST.md              # D-10: Pre-demo manual checklist

Pattern 1: Pytest Integration Skip Markers

What: Use custom pytest markers and environment variables to gate Azure integration tests When to use: Every integration test file Example:

# backend/tests/integration/conftest.py
import os
import pytest

# Custom marker for integration tests requiring Azure credentials
def pytest_configure(config):
    config.addinivalue_line(
        "markers", "integration: mark test as requiring real Azure credentials"
    )

def has_azure_openai_credentials() -> bool:
    """Check if Azure OpenAI credentials are available in environment."""
    return bool(
        os.environ.get("AZURE_OPENAI_ENDPOINT")
        and os.environ.get("AZURE_OPENAI_API_KEY")
        and os.environ.get("AZURE_OPENAI_DEPLOYMENT")
    )

def has_azure_speech_credentials() -> bool:
    return bool(
        os.environ.get("AZURE_SPEECH_KEY")
        and os.environ.get("AZURE_SPEECH_REGION")
    )

skip_no_openai = pytest.mark.skipif(
    not has_azure_openai_credentials(),
    reason="Azure OpenAI credentials not set (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT)"
)

skip_no_speech = pytest.mark.skipif(
    not has_azure_speech_credentials(),
    reason="Azure Speech credentials not set (AZURE_SPEECH_KEY, AZURE_SPEECH_REGION)"
)

Pattern 2: Performance Assertion in Integration Tests

What: Measure and assert response latency within integration tests themselves (D-11) When to use: Every Azure OpenAI and Speech API test Example:

import time
import pytest

@pytest.mark.integration
async def test_azure_openai_response_latency():
    """Azure OpenAI must respond within 3 seconds for conversational flow."""
    adapter = AzureOpenAIAdapter(
        endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    )
    request = CoachRequest(
        session_id="perf-test-1",
        message="Hello doctor, I'd like to discuss treatment options.",
        scenario_context="You are a cardiologist specializing in hypertension.",
    )

    start = time.perf_counter()
    first_token_time = None
    events = []
    async for event in adapter.execute(request):
        if first_token_time is None and event.type == CoachEventType.TEXT:
            first_token_time = time.perf_counter() - start
        events.append(event)
    total_time = time.perf_counter() - start

    assert first_token_time is not None, "Should receive at least one TEXT event"
    assert first_token_time < 3.0, f"First token latency {first_token_time:.2f}s exceeds 3s threshold"
    assert total_time < 15.0, f"Total response time {total_time:.2f}s is too long"

Pattern 3: Playwright Demo Flow with Page Object Pattern

What: Structure E2E tests with page helper functions for the multi-step demo flow (D-09) When to use: The demo flow E2E spec Example:

// Pattern: helper functions at top of spec file (lightweight, no full POM class)
async function configureAzureService(page: Page, serviceName: string, config: ServiceConfig) {
  await page.goto("/admin/azure-config");
  // Expand the service card
  await page.getByText(serviceName).first().click();
  await page.waitForTimeout(300);
  // Fill config fields
  if (config.endpoint) await page.getByPlaceholder("https://...").first().fill(config.endpoint);
  if (config.apiKey) await page.getByPlaceholder("Enter API key").first().fill(config.apiKey);
  // Save
  await page.getByRole("button", { name: /^save$/i }).first().click();
  await page.waitForTimeout(1000);
}

Anti-Patterns to Avoid

Modifying existing mock tests: Integration tests are additive. Do not change existing test files in backend/tests/ or frontend/e2e/.
Hardcoding credentials in test files: All credentials come from environment variables or the admin UI flow.
Running integration tests in CI: D-06 explicitly excludes CI integration. The pytest marker approach makes this safe -- pytest without -m integration runs only unit tests.
Testing with ASGITransport/in-memory DB for integration tests: Integration tests need the real app running with a real database to test config persistence and dynamic adapter registration.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Test skip logic	Custom test runner	`pytest.mark.skipif` + env var checks	Standard pytest pattern, well-documented
Latency measurement	Custom timing framework	`time.perf_counter()` in test assertions	Sub-millisecond precision, stdlib
Browser automation	Custom Selenium setup	Existing Playwright infrastructure	Already configured with auth setup
Audio test data	Generate audio files	Use Azure TTS to generate then feed to STT	Tests the real pipeline; no fake audio needed
Demo checklist tool	Custom checklist app	Markdown checklist document	Simple, version-controlled, no maintenance

Key insight: This phase creates zero new application code. Every test calls existing adapters, services, and UI pages. The only new files are test modules and a checklist document.

Common Pitfalls

Pitfall 1: Azure SDK Not Installed for Integration Tests

What goes wrong: azure-cognitiveservices-speech is in the optional [voice] dependency group, not the base install. Tests import it and fail with ImportError. Why it happens: The STT/TTS adapters use conditional import inside methods. Integration tests need the real SDK. How to avoid: Integration test conftest.py should check for the SDK and skip speech tests if not installed. Document: pip install -e ".[all]" is required for full integration testing. Warning signs: ImportError: No module named 'azure.cognitiveservices.speech'

Pitfall 2: Stale Config After Admin UI Changes

What goes wrong: Playwright tests configure Azure services via admin UI, but the backend Settings singleton (LRU-cached) doesn't reflect the new config until register_adapter_from_config is called. Why it happens: get_settings() uses @lru_cache -- environment variables are read once at startup. Dynamic config goes through the database and adapter registry, not Settings. How to avoid: E2E tests should configure services via the admin UI (PUT endpoint), which triggers register_adapter_from_config. Do not rely on .env changes during E2E tests. Warning signs: "Connection test succeeds but coaching session still uses mock adapter."

Pitfall 3: Azure Speech SDK Blocks Event Loop

What goes wrong: STT/TTS integration tests hang or timeout because the Azure Speech SDK is synchronous. Why it happens: The SDK uses gRPC internally and blocks. The adapters wrap calls in asyncio.to_thread(), but if tests call SDK directly (not through adapters), they'll block. How to avoid: Always test through the adapter methods (transcribe(), synthesize()), which handle the thread offloading. Warning signs: Test hangs indefinitely or hits the 60s timeout.

Pitfall 4: Voice Live API Region Restrictions

What goes wrong: Voice Live tests fail with authentication or 404 errors. Why it happens: Azure Voice Live API is only available in eastus2 and swedencentral (hardcoded in SUPPORTED_REGIONS). Using a different region silently fails. How to avoid: Integration test conftest.py should validate that the configured region is in SUPPORTED_REGIONS before running Voice Live tests. Warning signs: HTTP 404 or "Unsupported region" errors.

Pitfall 5: Playwright Timeout on Voice/Avatar Rendering

What goes wrong: E2E tests for voice and avatar mode timeout because WebRTC/WebSocket connections take time to establish. Why it happens: Default Playwright timeout is 30s. Avatar rendering involves WebRTC negotiation, ICE candidates, and media streams which can take 10-15s. How to avoid: Use extended timeouts (60-90s) for voice/avatar E2E tests. Use page.waitForSelector with generous timeouts for connection status indicators. Warning signs: TimeoutError: waiting for selector on avatar or voice connection elements.

Pitfall 6: Concurrent Test Interference with Adapter Registry

What goes wrong: Integration tests that register real Azure adapters interfere with unit tests that expect mock adapters. Why it happens: ServiceRegistry is a singleton -- registering an adapter in one test affects all subsequent tests. How to avoid: Integration tests are in a separate directory (tests/integration/) and run with a separate pytest invocation. The marker system ensures they don't run with pytest alone. Warning signs: Existing unit tests start failing with real API errors after running integration tests.

Code Examples

Integration Test: Azure OpenAI Streaming

# backend/tests/integration/test_azure_openai_integration.py
import os
import time

import pytest

from app.services.agents.adapters.azure_openai import AzureOpenAIAdapter
from app.services.agents.base import CoachEventType, CoachRequest

pytestmark = [pytest.mark.integration]

skip_no_credentials = pytest.mark.skipif(
    not all(os.environ.get(k) for k in [
        "AZURE_OPENAI_ENDPOINT", "AZURE_OPENAI_API_KEY", "AZURE_OPENAI_DEPLOYMENT"
    ]),
    reason="Azure OpenAI credentials not configured",
)


@skip_no_credentials
async def test_streaming_response():
    """Real Azure OpenAI returns streaming TEXT events followed by DONE."""
    adapter = AzureOpenAIAdapter(
        endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    )
    assert await adapter.is_available()

    request = CoachRequest(
        session_id="integration-test",
        message="Hello, I am a medical representative.",
        scenario_context="You are Dr. Wang, a cardiologist. Respond in 2-3 sentences.",
    )

    events = []
    async for event in adapter.execute(request):
        events.append(event)

    text_events = [e for e in events if e.type == CoachEventType.TEXT]
    assert len(text_events) > 0, "Should receive at least one TEXT event"
    assert events[-1].type == CoachEventType.DONE

    full_text = "".join(e.content for e in text_events)
    assert len(full_text) > 10, f"Response too short: {full_text}"

Integration Test: Azure Speech STT + TTS Round-Trip

# backend/tests/integration/test_azure_speech_integration.py
import os
import pytest

from app.services.agents.stt.azure import AzureSTTAdapter
from app.services.agents.tts.azure import AzureTTSAdapter

pytestmark = [pytest.mark.integration]

skip_no_credentials = pytest.mark.skipif(
    not all(os.environ.get(k) for k in ["AZURE_SPEECH_KEY", "AZURE_SPEECH_REGION"]),
    reason="Azure Speech credentials not configured",
)

skip_no_sdk = pytest.mark.skipif(
    not _speech_sdk_available(),
    reason="azure-cognitiveservices-speech package not installed",
)


@skip_no_credentials
@skip_no_sdk
async def test_tts_then_stt_round_trip():
    """Synthesize speech with TTS, then transcribe with STT. Text should be similar."""
    key = os.environ["AZURE_SPEECH_KEY"]
    region = os.environ["AZURE_SPEECH_REGION"]

    tts = AzureTTSAdapter(key, region)
    stt = AzureSTTAdapter(key, region)

    original_text = "Hello, I would like to discuss treatment options."
    audio_bytes = await tts.synthesize(original_text, language="en-US")
    assert len(audio_bytes) > 1000, "TTS should produce substantial audio data"

    transcribed = await stt.transcribe(audio_bytes, language="en-US")
    assert len(transcribed) > 0, "STT should produce non-empty transcription"
    # Fuzzy match: key words should appear
    assert "treatment" in transcribed.lower() or "discuss" in transcribed.lower()

Playwright E2E: Admin Config + Text Session Demo Flow

// frontend/e2e/demo-flow.spec.ts (pattern sketch)
test.describe("Full Demo Pipeline", () => {
  test.use({ storageState: join(authDir, "admin.json") });
  test.setTimeout(120_000); // 2 minute timeout for full flow

  test("admin configures Azure OpenAI and tests connection", async ({ page }) => {
    await page.goto("/admin/azure-config");
    // Expand Azure OpenAI card
    await page.getByText("Azure OpenAI", { exact: true }).click();
    await page.waitForTimeout(300);
    // Test connection button
    await page.getByRole("button", { name: /test connection/i }).first().click();
    // Wait for result
    await expect(page.locator("[data-sonner-toaster]")).toBeVisible({ timeout: 15000 });
  });
});

Smoke Test Checklist Pattern

## Pre-Demo Smoke Test Checklist

### Azure Service Health
- [ ] Azure OpenAI: admin config page -> Test Connection -> green checkmark
- [ ] Azure Speech STT: admin config page -> Test Connection -> green checkmark
- [ ] Azure Speech TTS: admin config page -> Test Connection -> green checkmark
- [ ] Azure Voice Live: admin config page -> Test Connection -> green checkmark

### Text Mode Demo
- [ ] Select scenario, start text session
- [ ] Send 3+ messages, verify AI responds in character (< 3s latency)
- [ ] End session, verify scoring report renders with all dimensions

### Voice Mode Demo
- [ ] Start voice session, verify connection status shows "connected"
- [ ] Speak into mic, verify transcript appears
- [ ] Verify AI response is spoken back
- [ ] End session, verify scoring works

### Avatar Mode Demo
- [ ] Start avatar session, verify avatar renders (no freeze/glitch)
- [ ] Verify lip-sync matches speech output
- [ ] Verify avatar fallback to voice-only if avatar service unavailable

State of the Art

Old Approach	Current Approach	When Changed	Impact
All tests use mocks	Integration tests use real Azure APIs	Phase 9	Validates real-world behavior
Manual Azure config validation	Automated connection testing per service	Phase 7	Faster pre-demo verification
No performance assertions	Latency thresholds in test assertions	Phase 9	Catches regressions before demo

Deprecated/outdated:

None. All existing patterns are current and should be preserved.

Open Questions

Audio format compatibility for STT integration test
- What we know: Azure Speech STT expects specific audio formats (WAV PCM 16kHz). TTS output format may not match STT input format directly.
- What's unclear: Whether the TTS output from synthesize() can be fed directly to transcribe() without format conversion.
- Recommendation: Test the round-trip first. If format mismatch occurs, the TTS adapter may need an explicit output format parameter, or the integration test should use a known WAV file.
Avatar rendering validation in automated tests
- What we know: D-12 requires avatar lip-sync validation. Avatar uses WebRTC which Playwright can observe (DOM elements, connection state) but cannot easily verify visual quality.
- What's unclear: How to programmatically verify "lip-sync matches speech" and "no freezing."
- Recommendation: E2E tests verify avatar DOM elements appear and connection state is "connected." Visual quality validation (D-12) is inherently manual -- include in smoke test checklist.
Fallback chain test reliability
- What we know: D-13 requires testing avatar -> voice -> text fallback. The fallback logic is in the frontend components.
- What's unclear: How to reliably trigger fallback in E2E tests (would need to temporarily break avatar config, then voice config).
- Recommendation: Use the admin UI to disable services one at a time and verify the fallback behavior in the user session.

Environment Availability

Dependency	Required By	Available	Version	Fallback
Python 3.11+	Backend tests	Yes	3.11.9	--
Node.js 20+	Playwright E2E	Yes	23.11.0	--
pytest	Integration tests	Yes	8.3.3	--
Playwright	E2E tests	Yes	1.58.2	--
openai SDK	Azure OpenAI tests	Yes	1.50.0+ (in pyproject.toml)	--
azure-cognitiveservices-speech	STT/TTS tests	Conditional	Requires `pip install -e ".[voice]"`	Skip speech tests if not installed
Azure OpenAI credentials	D-03, D-08	User-provided	--	Skip tests with marker
Azure Speech credentials	D-03, D-08	User-provided	--	Skip tests with marker
Azure Voice Live credentials	D-03, D-08	User-provided	--	Skip tests with marker

Missing dependencies with no fallback:

None (all are available or have skip markers)

Missing dependencies with fallback:

azure-cognitiveservices-speech: Optional install (pip install -e ".[voice]"). Speech integration tests skip if not installed.
Azure credentials: All integration tests skip gracefully when credentials are not in environment.

Project Constraints (from CLAUDE.md)

The following directives from CLAUDE.md apply to this phase:

Async everywhere: All backend test code must use async def with await
pytest-asyncio with asyncio_mode = "auto": No need for explicit @pytest.mark.asyncio
Ruff lint/format: All new test files must pass ruff check . and ruff format --check .
Double quotes: Ruff format enforces double quotes in Python
Line length 100: All Python code must stay within 100 character lines
TypeScript strict mode: E2E test TypeScript must pass npx tsc -b
No raw SQL: Tests must use SQLAlchemy ORM (relevant if integration tests interact with DB)
Conventional commits: Commits should use test: prefix
Pre-commit checklist: ruff check ., ruff format --check ., pytest -v must pass
Service layer holds business logic: Integration tests should test through adapter/service methods, not bypass them
Existing patterns: Follow the mock test patterns established in test_azure_openai_adapter.py and test_stt_tts_adapters.py for structure
E2E config: Playwright needs --config=e2e/playwright.config.ts flag (Gotcha #5)

Sources

Primary (HIGH confidence)

backend/tests/conftest.py -- Existing test fixture patterns, verified by reading source
backend/tests/test_azure_openai_adapter.py -- Existing Azure OpenAI mock test patterns
backend/tests/test_stt_tts_adapters.py -- Existing STT/TTS mock test patterns
backend/tests/test_voice_live.py -- Existing Voice Live test patterns
backend/app/services/agents/adapters/azure_openai.py -- Azure OpenAI adapter implementation
backend/app/services/agents/stt/azure.py -- Azure STT adapter implementation
backend/app/services/agents/tts/azure.py -- Azure TTS adapter implementation
backend/app/services/agents/avatar/azure.py -- Azure Avatar adapter (stub)
backend/app/services/connection_tester.py -- Connection testing patterns for all Azure services
backend/app/services/voice_live_service.py -- Voice Live service with SUPPORTED_REGIONS
backend/app/api/azure_config.py -- Admin config API with dynamic adapter registration
frontend/e2e/playwright.config.ts -- Playwright config with webServer setup
frontend/e2e/auth.setup.ts -- Authentication setup patterns (user + admin)
frontend/e2e/coaching-session.spec.ts -- Coaching session E2E test patterns
frontend/e2e/admin-azure-config.spec.ts -- Admin Azure config E2E test patterns
backend/pyproject.toml -- pytest config, dependency groups including [voice] optional

Secondary (MEDIUM confidence)

backend/.env.example -- All required environment variables documented
backend/app/config.py -- Settings class with all config fields and defaults

Tertiary (LOW confidence)

None. All findings are based on direct codebase analysis.

Metadata

Confidence breakdown:

Standard stack: HIGH -- All tools already installed and configured in the project
Architecture: HIGH -- Test patterns directly follow existing codebase conventions
Pitfalls: HIGH -- Identified from actual adapter code (conditional imports, asyncio.to_thread, SUPPORTED_REGIONS)
Performance: MEDIUM -- 3s threshold from D-11 is clear; actual Azure latency depends on region and load

Research date: 2026-03-27 Valid until: 2026-04-27 (stable -- testing patterns don't change rapidly)

UI Specification

Click to expand UI spec

Phase 09 -- UI Design Contract

Visual and interaction contract for Phase 09: Integration Testing with Real Azure Services. This phase creates NO new UI components, pages, or visual changes. It validates that the existing UI (built in Phases 01-08) works correctly with real Azure services end-to-end. The contract below documents the inherited design system state and defines the visual acceptance criteria that Playwright E2E tests and manual smoke tests must verify against. Generated by gsd-ui-researcher.

Phase Scope: Testing Only

Phase 09 is fundamentally different from prior phases. Per CONTEXT.md:

In scope: Pytest integration tests, Playwright E2E demo-flow tests, manual smoke test checklist, performance validation, fallback chain verification, UI polish verification
Out of scope: New features, new UI pages, new components, CI/CD integration of Azure tests
New files created: Test modules (.py, .spec.ts) and a markdown checklist (SMOKE_TEST_CHECKLIST.md). Zero frontend application code.

The UI-SPEC below therefore serves as the visual acceptance reference for test assertions rather than a build contract.

Design System

Property	Value
Tool	manual shadcn/ui adaptation (no `components.json`) -- inherited from Phase 01, unchanged
Preset	not applicable -- components adapted from local Figma Make exports
Component library	Radix UI primitives via shadcn/ui wrappers -- 22 base components in `frontend/src/components/ui/`
Icon library	lucide-react ^0.460.0
Font	Inter (EN) + Noto Sans SC (CN), loaded via Google Fonts

Source: Phase 01 UI-SPEC; Phase 08 UI-SPEC. No changes for Phase 09.

Spacing Scale

Inherited from Phase 01. No modifications. Declared values (multiples of 4):

Token	Value	Usage
xs	4px	Icon gaps, inline padding
sm	8px	Compact element spacing
md	16px	Default element spacing
lg	24px	Section padding
xl	32px	Layout gaps
2xl	48px	Major section breaks
3xl	64px	Page-level spacing, session headers

Exceptions: none new. Phase 08 exceptions (avatar area 280px, waveform 120px, mic button 56px, etc.) remain in effect.

E2E test relevance: Playwright tests should NOT assert pixel-level spacing. Layout structure assertions (element visibility, ordering, containment) are sufficient.

Typography

Inherited from Phase 01. No modifications.

Role	Size	Weight	Line Height
Body	16px (1rem)	400 (normal)	1.5
Label	16px (1rem)	500 (medium)	1.5
Heading (h3)	18px (1.125rem)	500 (medium)	1.5
Display (h1)	24px (1.5rem)	500 (medium)	1.5

Font stack: 'Inter', 'Noto Sans SC', sans-serif

E2E test relevance: Playwright tests verify text content presence and i18n key rendering. Typography values are not directly asserted in E2E tests.

Color

Inherited from Phase 01 + Phase 08 additions. No modifications.

Role	Value	Usage
Dominant (60%)	`#FFFFFF`	Background, surfaces
Secondary (30%)	`#F8FAFC` / `#ECECF0`	Cards, sidebar, nav, panels
Accent (10%)	`#1E40AF`	Primary CTA buttons, active mode indicator, connect button
Destructive	`#EF4444`	End session button, disconnect, error states

Accent reserved for: primary CTA buttons, active navigation items, active mode selector pill, mic button idle state, waveform bars, connection success dot, chart primary series.

E2E Test Color Verification

Playwright E2E tests should verify semantic states through CSS classes or data-* attributes, not hex color values. Specifically:

Visual State	How E2E Tests Should Verify
Connection status: connected	Check for green dot element visibility + "Connected" text
Connection status: error	Check for red dot element visibility + error text
Mode selector: active	Check `aria-pressed="true"` or active CSS class
Mic button state	Check `aria-label` text (changes per state)
Toast notifications	Check `[data-sonner-toaster]` visibility + toast text content

Copywriting Contract

Phase 09 creates NO new user-facing copy. All copy was defined in prior phases. The contract below documents the copy that E2E tests and smoke tests must verify is rendered correctly with real Azure services.

Demo Flow Copy Verification (Playwright E2E)

Element	Namespace	Key	Expected English Text
Admin config: Azure OpenAI card	admin	`admin.azureOpenai.name`	Azure OpenAI
Admin config: test connection success	common	(toast text)	Connection successful
Admin config: Voice Live region warning	admin	`admin.voiceLive.regionWarning`	Voice Live API is only available in East US 2 and Sweden Central regions.
Scenario selection: start button	coach	`coach.startSession`	Start Session
Text session: chat input placeholder	training	(placeholder text)	Type your message...
Voice session: connection status	voice	`voice.status.connected`	Connected
Voice session: end session dialog title	voice	`voice.endSessionTitle`	End Voice Session
Scoring report: heading	scoring	(heading text)	Scoring Report

Error State Copy Verification (Smoke Test)

Scenario	Expected Copy	Verified By
Azure OpenAI not configured	Mock adapter active; coaching works with mock responses	Manual: observe response quality difference
Avatar unavailable, voice available	"Avatar connection failed. Switching to voice-only mode." toast	Manual: disable avatar config, start avatar session
Voice unavailable, text available	"Voice connection failed. Switching to text mode." toast	Manual: disable voice config, start voice session
Wrong region configured	"Voice Live API is not available in the configured region..." warning	Manual: set non-supported region in admin config

Destructive Action Copy (Unchanged)

Action	Confirmation	Copy
End Session (all modes)	Dialog	"Are you sure you want to end this [text/voice] coaching session?"

Source: Phase 02 UI-SPEC (coach copy), Phase 08 UI-SPEC (voice copy). No new copy for Phase 09.

Visual Acceptance Criteria for E2E Tests

These are the visual/interaction contracts that Playwright E2E tests (D-09) and manual smoke tests (D-10) must verify.

Full Demo Pipeline Flow (D-02)

Step 1: Admin Login
  - Verify: Login form renders, admin credentials accepted, redirect to admin dashboard

Step 2: Admin Configures Azure OpenAI
  - Verify: Azure Config page loads, Azure OpenAI card expandable
  - Verify: Endpoint/key/deployment fields accept input
  - Verify: "Test Connection" button triggers real connection test
  - Verify: Success toast appears within 15 seconds

Step 3: Admin Configures Azure Speech (optional)
  - Verify: STT and TTS cards expandable
  - Verify: Key/region fields accept input
  - Verify: Test Connection succeeds

Step 4: User Login + Scenario Selection
  - Verify: Switch to user login, dashboard renders
  - Verify: Scenario selection page shows available scenarios
  - Verify: Mode selector shows text (always), voice (if configured), avatar (if configured)

Step 5: Text Coaching Session
  - Verify: Session starts, chat area renders
  - Verify: User sends message, AI responds (real Azure OpenAI, not mock)
  - Verify: Response appears within 3 seconds (D-11)
  - Verify: Key message checklist updates in hints panel
  - Verify: End session -> scoring report renders with all dimensions

Step 6: Voice Coaching Session (if Voice Live configured)
  - Verify: Connection status transitions: connecting -> connected
  - Verify: Mic button appears and is interactive
  - Verify: Transcript area shows real-time transcription
  - Verify: End session -> scoring report renders (same quality as text)

Step 7: Fallback Chain (D-13)
  - Verify: With avatar disabled, voice session shows waveform (not avatar video)
  - Verify: With voice disabled, session falls back to text-only with ChatInput

Performance Thresholds (D-11)

Metric	Threshold	How Measured
First AI text response token	< 3 seconds	Pytest integration test: `time.perf_counter()` around adapter.execute()
Total AI text response	< 15 seconds	Pytest integration test: total stream duration
Speech STT transcription	< 5 seconds	Pytest integration test: round-trip TTS -> STT
Admin config test connection	< 15 seconds	Playwright: wait for toast with timeout

Scoring Report Verification (D-14)

E2E tests must verify that scoring reports render identically for text, voice, and avatar sessions:

Element	Present in Report
Overall score (numeric)	Yes -- visible heading with score value
Per-dimension scores	Yes -- radar chart or dimension cards
Strengths section	Yes -- green-highlighted text blocks
Weaknesses section	Yes -- orange-highlighted text blocks
Improvement suggestions	Yes -- purple-highlighted text blocks
Conversation quotes	Yes -- quoted excerpts from transcript

Registry Safety

Registry	Blocks Used	Safety Gate
shadcn official (manual adaptation)	No new blocks in Phase 09	not required
Third-party registries	none	not applicable

Phase 09 adds zero new UI dependencies. All test infrastructure uses existing project dependencies (pytest, Playwright, httpx).

Component Inventory (Phase 09)

New Components

None. Phase 09 creates no new frontend components.

Modified Components

None. Phase 09 modifies no existing frontend components.

Components Under Test (Reference)

The following existing components are exercised by E2E tests but are NOT modified:

Component	E2E Coverage
Admin Azure Config page (`pages/admin/azure-config.tsx`)	Config entry, test connection, save
Scenario Selection page	Browse scenarios, select mode, start session
F2F Training Session page (`pages/user/training-session.tsx`)	Text coaching full flow
Voice Session page (`pages/user/voice-session.tsx`)	Voice coaching full flow
Scoring Report section	Post-session report rendering
Login page	Auth flow for admin and user

i18n Contract

No new i18n namespaces or keys. All existing namespaces are verified by E2E tests:

Namespace	Status	E2E Coverage
`auth`	Unchanged	Login flow
`admin`	Unchanged	Azure Config page
`coach`	Unchanged	Session flow
`voice`	Unchanged	Voice session flow
`scoring`	Unchanged	Scoring report
`common`	Unchanged	Shared UI elements

Smoke Test Checklist Contract (D-10)

The smoke test checklist document (docs/SMOKE_TEST_CHECKLIST.md) must cover these categories with checkbox format:

Category	Items
Azure Service Health	Test Connection for each configured Azure service (OpenAI, Speech STT, Speech TTS, Voice Live, Avatar)
Text Mode Demo	Start session, send 3+ messages, verify AI responds in character, verify < 3s latency, end session, verify scoring report
Voice Mode Demo	Start session, verify connection, speak into mic, verify transcript, verify AI spoken response, end session, verify scoring
Avatar Mode Demo	Start session, verify avatar renders, verify lip-sync (manual visual), verify fallback if avatar unavailable
Fallback Chain	Disable avatar -> verify voice-only fallback, disable voice -> verify text-only fallback
Scoring Parity	Compare scoring report quality across text, voice, and avatar sessions
Performance	Response latency feels conversational (< 3s), avatar renders without glitches, transitions are smooth

Source: CONTEXT.md D-10, D-11, D-12, D-13, D-14.

Checker Sign-Off

Approval: pending

Verification

Click to expand verification report

Phase 09: Integration Testing with Real Azure Services Verification Report

Phase Goal: Implement unified AI Foundry config (replacing 8 separate ServiceConfig rows), expand to 7 interaction modes, wire agent mode runtime end-to-end, redesign admin UI with single AI Foundry card, then validate all Azure service integrations with real credentials and polish demo experience for BeiGene customer presentations Verified: 2026-03-28T14:21:55Z Status: gaps_found Re-verification: No -- initial verification

Goal Achievement

Observable Truths

#	Truth	Status	Evidence
1	Admin configures a single AI Foundry endpoint/region/API key -- all 7 services derive from this unified config	VERIFIED	ServiceConfig.is_master flag exists. config_service.get_master_config/get_effective_key/get_effective_endpoint implement master-fallback pattern. API has GET/PUT /azure-config/ai-foundry. Lifespan loads master config first (main.py lines 79-110). Admin page uses useAIFoundryConfig/useUpdateAIFoundry hooks with endpoint/region/apiKey form fields. connection_tester accepts master_endpoint/master_key/master_region params.
2	Platform supports all 7 interaction modes	VERIFIED	Backend SessionCreate.mode is Literal with all 7 values. Frontend SessionMode type has all 7 values. mode-selector.tsx has MODE_MAP covering all 7 modes. Session model column widened to String(40).
3	Agent mode works end-to-end: token broker returns agent_id/project_name, frontend connects via voice-agent/realtime WebSocket	PARTIAL	Backend: voice_live_service.py parses mode via parse_voice_live_mode, returns agent_id/project_name in VoiceLiveTokenResponse (lines 51-65). Frontend: use-voice-live.ts constructs voice-agent/realtime WebSocket URL when agent_id is present (line 102-104). GAP: training.tsx line 70 hardcodes agentAvailable=false, making agent mode unreachable from the UI mode selector.
4	Two-level mode selector UI: communication type first, then engine	VERIFIED	mode-selector.tsx implements Level 1 (text/voice/digital_human) and Level 2 (pipeline/realtime_model/realtime_agent) with MODE_MAP. Props include pipelineAvailable and agentAvailable. data-testid="engine-{engine}" for E2E testing. training.tsx passes pipelineAvailable and agentAvailable to ModeSelector.
5	Integration tests validate each Azure service with real credentials	VERIFIED	4 test modules with 18 total tests: OpenAI (6 tests incl. first-token latency < 3s), Speech (5 tests incl. TTS->STT round-trip), Voice Live (4 tests incl. region validation), Avatar (3 tests incl. ICE token retrieval). All use pytestmark=[pytest.mark.integration]. pyproject.toml addopts="-m 'not integration'" auto-deselects. Confirmed: 921 passed, 14 skipped, 18 deselected.
6	E2E demo flow works: Login -> Admin config -> Text session -> Voice/Avatar session -> Score report	VERIFIED	demo-flow.spec.ts has 4 test cases with 120s timeout, admin/user auth via storageState, screenshot capture. SMOKE_TEST_CHECKLIST.md has 84 checkbox items across 8 sections covering all demo stages.

Score: 5/6 truths verified (1 partial)

Required Artifacts

Artifact	Expected	Status	Details
`backend/app/models/service_config.py`	is_master field	VERIFIED	Line 20: `is_master: Mapped[bool] = mapped_column(Boolean, default=False)`
`backend/app/schemas/session.py`	7-mode Literal	VERIFIED	Lines 15-20: all 7 mode values present
`backend/app/schemas/voice_live.py`	agent_id/project_name	VERIFIED	Lines 16-17: `agent_id: str
`backend/app/schemas/azure_config.py`	AIFoundryConfigUpdate	VERIFIED	Line 17: `class AIFoundryConfigUpdate(BaseModel)`
`backend/app/services/config_service.py`	Master config functions	VERIFIED	get_master_config (line 53), upsert_master_config (line 61), get_effective_key (line 148), get_effective_endpoint (line 163)
`backend/app/services/voice_live_service.py`	Token broker with agent fields	VERIFIED	parse_voice_live_mode (line 51), agent_id in response (line 64)
`backend/app/api/azure_config.py`	AI Foundry endpoints	VERIFIED	GET /ai-foundry (line 126), PUT /ai-foundry (line 160)
`backend/alembic/versions/f09a_unified_ai_foundry_config.py`	Migration	VERIFIED	File exists
`backend/app/main.py`	Lifespan loads master config	VERIFIED	Lines 79-110: loads master config first, passes master_endpoint/master_key/master_region to per-service registration
`frontend/src/types/voice-live.ts`	7-mode SessionMode	VERIFIED	Lines 7-8: voice_realtime_agent, digital_human_realtime_agent present
`frontend/src/types/azure-config.ts`	AIFoundryConfig type	VERIFIED	Lines 20, 28: AIFoundryConfig and AIFoundryConfigUpdate interfaces
`frontend/src/api/azure-config.ts`	API client functions	VERIFIED	getAIFoundryConfig (line 47), updateAIFoundryConfig (line 54)
`frontend/src/hooks/use-azure-config.ts`	TanStack Query hooks	VERIFIED	useAIFoundryConfig (line 43), useUpdateAIFoundry (line 50)
`frontend/src/pages/admin/azure-config.tsx`	AI Foundry master card	VERIFIED	Uses useAIFoundryConfig/useUpdateAIFoundry, renders AI Foundry card with endpoint/region/apiKey fields, per-service toggles below
`frontend/src/components/voice/mode-selector.tsx`	Two-level selector	VERIFIED	Imports SessionMode, has MODE_MAP for 7 modes, pipelineAvailable/agentAvailable props, data-testid="engine-{engine}"
`frontend/src/hooks/use-voice-live.ts`	Agent WebSocket path	VERIFIED	voice-agent/realtime (line 104), openai/realtime (line 111), agent_id check (line 102)
`backend/tests/integration/conftest.py`	Skip markers	VERIFIED	4 credential helpers, 4 skip markers, marker registration
`backend/tests/integration/test_azure_openai_integration.py`	OpenAI tests	VERIFIED	6 tests incl. first_token_time < 3.0 assertion
`backend/tests/integration/test_azure_speech_integration.py`	Speech tests	VERIFIED	5 tests incl. TTS->STT round-trip
`backend/tests/integration/test_voice_live_integration.py`	Voice Live tests	VERIFIED	4 tests with region validation and endpoint reachability
`backend/tests/integration/test_avatar_integration.py`	Avatar tests	VERIFIED	3 tests with ICE relay token retrieval
`frontend/e2e/demo-flow.spec.ts`	Demo E2E test	VERIFIED	4 test cases, 120s timeout, admin+user auth, screenshots
`docs/SMOKE_TEST_CHECKLIST.md`	Smoke test checklist	VERIFIED	84 checkbox items, 8 sections, quick recovery table

Key Link Verification

From	To	Via	Status	Details
voice_live_service.py	config_service.py	get_master_config	WIRED	Line 41: `master = await config_service.get_master_config(db)`
azure_config.py API	config_service.py	register_adapter_from_config	WIRED	Lines 182, 257: called with master fallback params
main.py lifespan	ServiceConfig model	is_master query	WIRED	Lines 83-84: queries ServiceConfig.is_master == True
azure-config.tsx	use-azure-config.ts	useAIFoundryConfig/useUpdateAIFoundry	WIRED	Lines 113, 116: hooks called and data used for form state
mode-selector.tsx	types/voice-live.ts	SessionMode import	WIRED	Line 6: `import type { SessionMode } from "@/types/voice-live"`
use-voice-live.ts	types/voice-live.ts	agent_id/voice-agent	WIRED	Lines 102-111: conditional WebSocket URL based on agent_id
demo-flow.spec.ts	auth.setup.ts	storageState admin.json	WIRED	Lines 77, 139, 239, 293: storageState references
demo-flow.spec.ts	azure-config page	Navigation	WIRED	Line 35: goto("/admin/azure-config")
connection_tester.py	master config	master_endpoint/master_key/master_region params	WIRED	Lines 189-191: params accepted, lines 199-218: used as fallback

Data-Flow Trace (Level 4)

Artifact	Data Variable	Source	Produces Real Data	Status
azure-config.tsx	aiFoundryData	useAIFoundryConfig -> GET /azure-config/ai-foundry	Yes (DB query via get_master_config)	FLOWING
mode-selector.tsx	value (SessionMode)	props from training.tsx	Yes (derived from user selection)	FLOWING
use-voice-live.ts	tokenData	/api/v1/voice-live/token broker	Yes (DB query + parse_voice_live_mode)	FLOWING

Behavioral Spot-Checks

Behavior	Command	Result	Status
ServiceConfig has is_master	python3 -c "from app.models.service_config import ServiceConfig; assert hasattr(ServiceConfig, 'is_master')"	OK	PASS
7-mode schema accepts voice_realtime_agent	python3 -c "from app.schemas.session import SessionCreate; s = SessionCreate(scenario_id='x', mode='voice_realtime_agent')"	OK	PASS
VoiceLiveTokenResponse has agent_id	python3 -c "from app.schemas.voice_live import VoiceLiveTokenResponse; r = VoiceLiveTokenResponse(..., agent_id='a1', project_name='p1'); assert r.agent_id == 'a1'"	OK	PASS
Master config functions importable	python3 -c "from app.services.config_service import get_master_config, upsert_master_config, get_effective_key, get_effective_endpoint"	OK	PASS
Backend tests pass (921) with integration deselected (18)	pytest tests/ -x -q	921 passed, 14 skipped, 18 deselected	PASS
Integration tests properly deselected	pytest tests/integration/ -v --collect-only	18 items collected, all deselected	PASS
Frontend TypeScript compiles	npx tsc -b --noEmit	Clean (no output)	PASS

Requirements Coverage

Requirement	Source Plan	Description	Status	Evidence
COACH-04	09-01, 09-03, 09-04, 09-05	Voice input via Azure Speech STT	SATISFIED	7-mode schema includes voice_pipeline; Speech integration tests cover STT; smoke checklist covers voice mode
COACH-05	09-01, 09-03, 09-04, 09-05	AI HCP responses via Azure Speech TTS	SATISFIED	TTS integration tests with round-trip; voice pipeline mode in schema
COACH-06	09-01, 09-02, 09-03, 09-04, 09-05	GPT Realtime API WebSocket	SATISFIED	voice_realtime_model and voice_realtime_agent modes; use-voice-live hook with agent/model WebSocket paths; Voice Live integration tests
COACH-07	09-01, 09-03, 09-04, 09-05	Azure AI Avatar digital human	SATISFIED	digital_human_* modes in schema; Avatar integration tests with ICE token; smoke checklist covers avatar mode
PLAT-03	09-01, 09-02, 09-04, 09-05	Admin configures Azure service connections from web UI	SATISFIED	AI Foundry admin card with endpoint/region/key; per-service toggles; connection testing
PLAT-05	09-01, 09-02, 09-03, 09-05	Voice mode configurable per deployment/session	SATISFIED	7 session modes selectable; two-level mode selector; per-service enable/disable

No orphaned requirements found (REQUIREMENTS.md traceability table does not map any additional IDs to Phase 09 beyond what the plans claim).

Anti-Patterns Found

File	Line	Pattern	Severity	Impact
frontend/src/pages/user/training.tsx	70	`agentAvailable = false` hardcoded	Warning	Agent mode engine button is permanently disabled in the mode selector. Backend agent path is fully wired but unreachable from UI.

No TODO/FIXME/PLACEHOLDER patterns found in key phase files. No empty implementations. No console.log-only handlers. All other files clean.

Human Verification Required

1. AI Foundry Admin Config Visual Layout

Test: Login as admin, navigate to Admin > Azure Config. Verify AI Foundry master card appears at top with endpoint/region/API key fields, followed by per-service toggle list below. Expected: Single prominent card for AI Foundry, simplified per-service rows with toggle switches and model name fields (no per-service endpoint/key fields). Why human: Visual layout and UX quality cannot be verified programmatically.

2. Two-Level Mode Selector Interaction

Test: Login as user, select a scenario. Verify mode selector shows Level 1 (Text/Voice/Digital Human) and Level 2 (Pipeline/Realtime/Agent) when non-text mode is selected. Expected: Level 2 row appears/hides based on Level 1 selection. Disabled engines show as grayed out. Text mode hides Level 2 entirely. Why human: Multi-step UI interaction and visual state transitions.

3. End-to-End Demo with Real Azure Credentials

Test: Configure real Azure AI Foundry credentials via admin UI. Run through full demo: text session with AI response, voice session with realtime WebSocket, avatar session with digital human rendering, scoring report. Expected: All modes produce real AI responses (not mock), voice has sub-3s latency, avatar renders with lip sync, scoring report has meaningful content. Why human: Requires real Azure credentials, microphone, WebRTC-capable browser, and subjective quality assessment.

4. Integration Tests with Real Credentials

Test: Set AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT, AZURE_SPEECH_KEY, AZURE_SPEECH_REGION, AZURE_VOICE_LIVE_ENDPOINT, AZURE_VOICE_LIVE_API_KEY, AZURE_VOICE_LIVE_REGION env vars. Run pytest -m integration. Expected: All 18 integration tests pass including first-token latency < 3s, TTS->STT round-trip, ICE token retrieval. Why human: Requires Azure credentials not available in automated verification environment.

Gaps Summary

One gap was found: agent mode UI discoverability. The backend token broker correctly returns agent_id/project_name when voice_live is configured in agent mode. The frontend use-voice-live hook correctly constructs the voice-agent/realtime WebSocket URL when agent_id is present. However, agentAvailable is hardcoded to false in frontend/src/pages/user/training.tsx (line 70), which means the "Agent" engine button in the two-level mode selector is permanently disabled. Users cannot reach agent mode through the UI.

The fix is straightforward: derive agentAvailable from the VoiceLiveConfigStatus endpoint or from the voice live service response that indicates agent mode is configured. This is a single-line change with a corresponding API enhancement to expose agent availability.

All other 5 success criteria are fully verified with codebase evidence and behavioral spot-checks.

Verified: 2026-03-28T14:21:55Z Verifier: Claude (gsd-verifier)

Planning Phase 09 - huqianghui/AI-Coach-vibe-coding GitHub Wiki

Phase 09: Integration Testing With Real Azure Services

Context & Decisions

Phase 09: Integration Testing with Real Azure Services - Context

Unified AI Foundry Config

Admin UI — Single AI Foundry Card

Seven Interaction Modes

Mode Selector UI — Two-Level

Agent Mode Runtime

Test Scope & Strategy

Acceptance Criteria

Claude's Discretion

Canonical References

Azure AI Foundry Resource

Current Config Architecture (Phase 07)

Azure Service Adapters

Voice Live & Avatar (Phase 08)

Admin Configuration UI

Mode Selection

Existing Tests

Project Requirements

Reference Repository

Existing Code Insights

Reusable Assets

Established Patterns

Integration Points

Key Gaps (from codebase audit)

Plans (5)

Research

Phase 09: Integration Testing with Real Azure Services - Research

Summary

User Constraints (from CONTEXT.md)

Locked Decisions

Claude's Discretion

Deferred Ideas (OUT OF SCOPE)

Standard Stack

Core

Supporting

Alternatives Considered

Architecture Patterns

Recommended Test File Structure

Pattern 1: Pytest Integration Skip Markers

Pattern 2: Performance Assertion in Integration Tests

Pattern 3: Playwright Demo Flow with Page Object Pattern

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: Azure SDK Not Installed for Integration Tests

Pitfall 2: Stale Config After Admin UI Changes

Pitfall 3: Azure Speech SDK Blocks Event Loop

Pitfall 4: Voice Live API Region Restrictions

Pitfall 5: Playwright Timeout on Voice/Avatar Rendering

Pitfall 6: Concurrent Test Interference with Adapter Registry

Code Examples

Integration Test: Azure OpenAI Streaming

Integration Test: Azure Speech STT + TTS Round-Trip

Playwright E2E: Admin Config + Text Session Demo Flow

Smoke Test Checklist Pattern

State of the Art

Open Questions

Environment Availability

Project Constraints (from CLAUDE.md)

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

UI Specification

Phase 09 -- UI Design Contract

Phase Scope: Testing Only

Design System

Spacing Scale

Typography

Color

E2E Test Color Verification

Copywriting Contract

Demo Flow Copy Verification (Playwright E2E)

Error State Copy Verification (Smoke Test)

Destructive Action Copy (Unchanged)

Visual Acceptance Criteria for E2E Tests

⚠️ GitHub.com Fallback ⚠️