Chat UI Component ‐ Speech to text - IgniteUI/igniteui-webcomponents GitHub Wiki
Chat UI Component - Speech to text Specification
Contents
- Overview
- User Stories
- Functionality
- Test Scenarios
- Accessibility
- Assumptions and Limitations
- References
Owned by
CodeX Team
Ivan Petrov
Designer Name
Requires approval from
- Peer Developer Name | Date:
- Design Manager Name | Date:
Signed off by
- Product Owner Name | Date:
- Platform Architect Name | Date:
Revision History
Version | Users | Date | Notes |
---|---|---|---|
1 | Ivan Petrov | 14.10.2025 | Initial specification |
1. Overview
Objectives
- Add speech-to-text (STT) functionality to the Chat UI Component, allowing users to dictate messages using their voice.
- The feature supports two STT modes:
- Backend Transcription Mode – Audio is streamed via WebSocket/SignalR to a backend service that integrates with a 3rd party transcription service: Google Speech-to-Text / Vertex AI / etc..
This backend service is provided as standalone project (candidate for Nuget package). Repository here:
igniteui-speech-to-text-server - Frontend (Web Speech API) Mode – Browser-native transcription handled entirely in the frontend (no server dependency).
PoC: https://github.com/IgniteUI/igniteui-webcomponents/pull/1893
Complementary Backend project: igniteui-speech-to-text-server
Acceptance criteria
Must-have before we can consider the feature a sprint candidate
- Users can record and transcribe voice messages directly in the chat input in real-time.
- Developers can configure which STT provider is used.
- Transcription output appears in the message input field in real-time.
- The system automatically stops on silence timeout.
- Works across Chrome, Edge, Safari (Web Speech fallback).
2. User Stories
Elaborate more on the multi-facetted use cases
Developer stories:
- Story 1: As a developer, I want to enable STT via a component options so I don’t need to write custom integration code.
- Story 2: As a developer, I want to choose between backend or frontend transcription providers.
End-user stories:
- Story 1: As an end-user, I want to dictate a message in the chat box using my microphone.
- Story 2: As an end-user, I want visual feedback (mic pulse and silence countdown) during recording.
- Story 3: As an end-user, I want the transcription to stop automatically when I stop speaking and auto-submit the message.
- Story 3: As an end-user, I want to have the ability to manually stop the transcription. This should not trigger auto-submitting the message so that it's available for further editing.
3. Functionality
Describe behavior, design, look and feel of the implemented feature. Always include visual mock-up
3.1. End-User Experience
- A microphone icon is displayed next to the message input field.
- Clicking the icon starts recording. The microphone icon is replaced by a stop icon.
- Visual feedback begins when voice is detected - pulsing stop icon.
- Live transcription text appears in the message input field.
- When silence is detected, a timeout animation is presented (countdown circle). If during countdown, voice is again detected, the countdown resets.
- When silence timeout ends or the user clicks stop, recording stops and transcription is finalized.
- When transcription finishes due to silence timeout, the message is auto-submitted.
3.2. Developer Experience
Frontend setup:
- Add speech to text options in the chat component options
speakPlaceholder: 'Speak...',
...
speechToText: {
enable: true,
lang: 'en-US',
serviceProvider: 'webspeech', // 'webspeech' | 'backend'
serviceUri: 'https://localhost:5000/sttHub',
},
Name | Description | Type | Default | Valid values |
---|---|---|---|---|
speakPlaceholder | Chat input placeholder when speech-to-text is activated | String | Null | e.g. "Speak..." |
enable | Enables speech-to-text | Boolean | false | true / false |
lang | Language for transcription | String | null | e.g. "en-US", "de-DE" |
serviceProvider | Which transcription provider to use. Requires serviceUri | String | null | "backend" / "webspeech" |
serviceUri | Backend Hub endpoint (SignalR) | String | null | URL |
Backend setup (if serviceProvider: 'backend')
- Run complementary project igniteui-speech-to-text-server
- Make sure you provide proper 3rd party credentials and configuration
- Make sure CORS policy aligns with frontend project (port setting)
3.3. Globalization/Localization
Language setting controls transcription locale.
3.4. Keyboard Navigation
Keys | Description |
---|---|
3.5. API
Options
Name | Description | Type | Default value | Valid values |
---|---|---|---|---|
SILENCE_TIMEOUT_MS | Timeout before automatic stop in ms | Number | 4000 | Any integer ≥ 0 |
SILENCE_GRACE_PERIOD | Time before silence countdown animation starts | Number | 1000 | Any integer < SILENCE_TIMEOUT_MS |
Methods
Name | Description | Return type | Parameters |
---|---|---|---|
start() | Begin recording and transcription | Promise | language?: string |
stop() | Stop recording and finalize transcription | void | – |
Events
Name | Description | Cancelable | Parameters |
---|---|---|---|
onPulseSignal | Fired when STT detects voice (actually fired when a transcription of that voice is received, for simplification) | No | — |
onStartCountdown | Fired when silence countdown animation should start | No | { ms: number | null} |
onTranscript | Fired when transcription text updates | No | { text: string } |
onStopInProgress | Fired when user clicks stop, but service awaits final transcription result | No | — |
onFinishedTranscribing | Fired when transcription completes | No | { finish: 'auto' | 'manual' } |
4. Test Scenarios
Automation
- Scenario 1: Enable STT and verify that transcription events fire correctly.
- Scenario 2: Simulate silence and verify auto-stop triggers onFinishedTranscribing.
- Scenario 3: Validate switching between backend and Web Speech providers.
- Scenario 4: Check that microphone icon is replaced with stop icon during active transcription.
- Scenario 5: Verify error propagation from backend to frontend.
5. Accessibility
ARIA Support
- Microphone button includes aria-pressed and aria-label states
RTL Support
6. Assumptions and Limitations
Assumptions | Limitation Notes |
---|---|
Users have microphone permissions | Browsers may block mic access if not HTTPS |
Backend supports SignalR | Only HTTP/HTTPS connections supported, no raw WS |
Browser supports Web Speech API | Safari/Firefox partial support |