Chat UI Component ‐ Speech to text - IgniteUI/igniteui-webcomponents GitHub Wiki

Chat UI Component - Speech to text Specification

Contents

  1. Overview
  2. User Stories
  3. Functionality
  4. Test Scenarios
  5. Accessibility
  6. Assumptions and Limitations
  7. References

Owned by

CodeX Team

Ivan Petrov

Designer Name

Requires approval from

  • Peer Developer Name | Date:
  • Design Manager Name | Date:

Signed off by

  • Product Owner Name | Date:
  • Platform Architect Name | Date:

Revision History

Version Users Date Notes
1 Ivan Petrov 14.10.2025 Initial specification

1. Overview

Objectives

  • Add speech-to-text (STT) functionality to the Chat UI Component, allowing users to dictate messages using their voice.
  • The feature supports two STT modes:
  1. Backend Transcription Mode – Audio is streamed via WebSocket/SignalR to a backend service that integrates with a 3rd party transcription service: Google Speech-to-Text / Vertex AI / etc..
    This backend service is provided as standalone project (candidate for Nuget package). Repository here: igniteui-speech-to-text-server
  2. Frontend (Web Speech API) Mode – Browser-native transcription handled entirely in the frontend (no server dependency).

PoC: https://github.com/IgniteUI/igniteui-webcomponents/pull/1893

Complementary Backend project: igniteui-speech-to-text-server

Acceptance criteria

Must-have before we can consider the feature a sprint candidate

  1. Users can record and transcribe voice messages directly in the chat input in real-time.
  2. Developers can configure which STT provider is used.
  3. Transcription output appears in the message input field in real-time.
  4. The system automatically stops on silence timeout.
  5. Works across Chrome, Edge, Safari (Web Speech fallback).

2. User Stories

Elaborate more on the multi-facetted use cases

Developer stories:

  • Story 1: As a developer, I want to enable STT via a component options so I don’t need to write custom integration code.
  • Story 2: As a developer, I want to choose between backend or frontend transcription providers.

End-user stories:

  • Story 1: As an end-user, I want to dictate a message in the chat box using my microphone.
  • Story 2: As an end-user, I want visual feedback (mic pulse and silence countdown) during recording.
  • Story 3: As an end-user, I want the transcription to stop automatically when I stop speaking and auto-submit the message.
  • Story 3: As an end-user, I want to have the ability to manually stop the transcription. This should not trigger auto-submitting the message so that it's available for further editing.

3. Functionality

Describe behavior, design, look and feel of the implemented feature. Always include visual mock-up

3.1. End-User Experience

  • A microphone icon is displayed next to the message input field.
  • Clicking the icon starts recording. The microphone icon is replaced by a stop icon.
  • Visual feedback begins when voice is detected - pulsing stop icon.
  • Live transcription text appears in the message input field.
  • When silence is detected, a timeout animation is presented (countdown circle). If during countdown, voice is again detected, the countdown resets.
  • When silence timeout ends or the user clicks stop, recording stops and transcription is finalized.
  • When transcription finishes due to silence timeout, the message is auto-submitted.

3.2. Developer Experience

Frontend setup:

  • Add speech to text options in the chat component options
speakPlaceholder: 'Speak...',
...
speechToText: {
    enable: true,
    lang: 'en-US',
    serviceProvider: 'webspeech', // 'webspeech' | 'backend'
    serviceUri: 'https://localhost:5000/sttHub',
  },
Name Description Type Default Valid values
speakPlaceholder Chat input placeholder when speech-to-text is activated String Null e.g. "Speak..."
enable Enables speech-to-text Boolean false true / false
lang Language for transcription String null e.g. "en-US", "de-DE"
serviceProvider Which transcription provider to use. Requires serviceUri String null "backend" / "webspeech"
serviceUri Backend Hub endpoint (SignalR) String null URL

Backend setup (if serviceProvider: 'backend')

  • Run complementary project igniteui-speech-to-text-server
  • Make sure you provide proper 3rd party credentials and configuration
  • Make sure CORS policy aligns with frontend project (port setting)

3.3. Globalization/Localization

Language setting controls transcription locale.

3.4. Keyboard Navigation

Keys Description

3.5. API

Options

Name Description Type Default value Valid values
SILENCE_TIMEOUT_MS Timeout before automatic stop in ms Number 4000 Any integer ≥ 0
SILENCE_GRACE_PERIOD Time before silence countdown animation starts Number 1000 Any integer < SILENCE_TIMEOUT_MS

Methods

Name Description Return type Parameters
start() Begin recording and transcription Promise language?: string
stop() Stop recording and finalize transcription void

Events

Name Description Cancelable Parameters
onPulseSignal Fired when STT detects voice (actually fired when a transcription of that voice is received, for simplification) No
onStartCountdown Fired when silence countdown animation should start No { ms: number | null}
onTranscript Fired when transcription text updates No { text: string }
onStopInProgress Fired when user clicks stop, but service awaits final transcription result No
onFinishedTranscribing Fired when transcription completes No { finish: 'auto' | 'manual' }

4. Test Scenarios

Automation

  • Scenario 1: Enable STT and verify that transcription events fire correctly.
  • Scenario 2: Simulate silence and verify auto-stop triggers onFinishedTranscribing.
  • Scenario 3: Validate switching between backend and Web Speech providers.
  • Scenario 4: Check that microphone icon is replaced with stop icon during active transcription.
  • Scenario 5: Verify error propagation from backend to frontend.

5. Accessibility

ARIA Support

  • Microphone button includes aria-pressed and aria-label states

RTL Support

6. Assumptions and Limitations

Assumptions Limitation Notes
Users have microphone permissions Browsers may block mic access if not HTTPS
Backend supports SignalR Only HTTP/HTTPS connections supported, no raw WS
Browser supports Web Speech API Safari/Firefox partial support

7. References

Google Cloud Speech-to-Text V2 Docs

Web Speech API Spec (MDN)

SignalR Documentation