Chat UI Component ‐ Speech to text - IgniteUI/igniteui-webcomponents GitHub Wiki

Chat UI Component - Speech to text Specification

Overview
User Stories
Functionality
Test Scenarios
Accessibility
Assumptions and Limitations
References

Owned by

CodeX Team

Ivan Petrov

Designer Name

Requires approval from

Peer Developer Name | Date:
Design Manager Name | Date:

Signed off by

Product Owner Name | Date:
Platform Architect Name | Date:

Revision History

Version	Users	Date	Notes
1	Ivan Petrov	14.10.2025	Initial specification

1. Overview

Objectives

Add speech-to-text (STT) functionality to the Chat UI Component, allowing users to dictate messages using their voice.
The feature supports two STT modes:

Backend Transcription Mode – Audio is streamed via WebSocket/SignalR to a backend service that integrates with a 3rd party transcription service: Google Speech-to-Text / Vertex AI / etc..
This backend service is provided as standalone project (candidate for Nuget package). Repository here: igniteui-speech-to-text-server
Frontend (Web Speech API) Mode – Browser-native transcription handled entirely in the frontend (no server dependency).

PoC: https://github.com/IgniteUI/igniteui-webcomponents/pull/1893

Complementary Backend project: igniteui-speech-to-text-server

Acceptance criteria

Must-have before we can consider the feature a sprint candidate

Users can record and transcribe voice messages directly in the chat input in real-time.
Developers can configure which STT provider is used.
Transcription output appears in the message input field in real-time.
The system automatically stops on silence timeout.
Works across Chrome, Edge, Safari (Web Speech fallback).

2. User Stories

Elaborate more on the multi-facetted use cases

Developer stories:

Story 1: As a developer, I want to enable STT via a component options so I don’t need to write custom integration code.
Story 2: As a developer, I want to choose between backend or frontend transcription providers.

End-user stories:

Story 1: As an end-user, I want to dictate a message in the chat box using my microphone.
Story 2: As an end-user, I want visual feedback (mic pulse and silence countdown) during recording.
Story 3: As an end-user, I want the transcription to stop automatically when I stop speaking and auto-submit the message.
Story 3: As an end-user, I want to have the ability to manually stop the transcription. This should not trigger auto-submitting the message so that it's available for further editing.

3. Functionality

Describe behavior, design, look and feel of the implemented feature. Always include visual mock-up

3.1. End-User Experience

A microphone icon is displayed next to the message input field.
Clicking the icon starts recording. The microphone icon is replaced by a stop icon.
Visual feedback begins when voice is detected - pulsing stop icon.
Live transcription text appears in the message input field.
When silence is detected, a timeout animation is presented (countdown circle). If during countdown, voice is again detected, the countdown resets.
When silence timeout ends or the user clicks stop, recording stops and transcription is finalized.
When transcription finishes due to silence timeout, the message is auto-submitted.

3.2. Developer Experience

Frontend setup:

Add speech to text options in the chat component options

speakPlaceholder: 'Speak...',
...
speechToText: {
    enable: true,
    lang: 'en-US',
    serviceProvider: 'webspeech', // 'webspeech' | 'backend'
    serviceUri: 'https://localhost:5000/sttHub',
  },

Name	Description	Type	Default	Valid values
speakPlaceholder	Chat input placeholder when speech-to-text is activated	String	Null	e.g. "Speak..."
enable	Enables speech-to-text	Boolean	false	true / false
lang	Language for transcription	String	null	e.g. "en-US", "de-DE"
serviceProvider	Which transcription provider to use. Requires serviceUri	String	null	"backend" / "webspeech"
serviceUri	Backend Hub endpoint (SignalR)	String	null	URL

Backend setup (if serviceProvider: 'backend')

Run complementary project igniteui-speech-to-text-server
Make sure you provide proper 3rd party credentials and configuration
Make sure CORS policy aligns with frontend project (port setting)

3.3. Globalization/Localization

Language setting controls transcription locale.

3.4. Keyboard Navigation

Keys	Description

3.5. API

Options

Name	Description	Type	Default value	Valid values
SILENCE_TIMEOUT_MS	Timeout before automatic stop in ms	Number	4000	Any integer ≥ 0
SILENCE_GRACE_PERIOD	Time before silence countdown animation starts	Number	1000	Any integer < SILENCE_TIMEOUT_MS

Methods

Name	Description	Return type	Parameters
start()	Begin recording and transcription	Promise	language?: string
stop()	Stop recording and finalize transcription	void	–

Events

Name	Description	Cancelable	Parameters
onPulseSignal	Fired when STT detects voice (actually fired when a transcription of that voice is received, for simplification)	No	—
onStartCountdown	Fired when silence countdown animation should start	No	`{ ms: number \| null}`
onTranscript	Fired when transcription text updates	No	`{ text: string }`
onStopInProgress	Fired when user clicks stop, but service awaits final transcription result	No	—
onFinishedTranscribing	Fired when transcription completes	No	`{ finish: 'auto' \| 'manual' }`

4. Test Scenarios

Automation

Scenario 1: Enable STT and verify that transcription events fire correctly.
Scenario 2: Simulate silence and verify auto-stop triggers onFinishedTranscribing.
Scenario 3: Validate switching between backend and Web Speech providers.
Scenario 4: Check that microphone icon is replaced with stop icon during active transcription.
Scenario 5: Verify error propagation from backend to frontend.

5. Accessibility

ARIA Support

Microphone button includes aria-pressed and aria-label states

RTL Support

6. Assumptions and Limitations

Assumptions	Limitation Notes
Users have microphone permissions	Browsers may block mic access if not HTTPS
Backend supports SignalR	Only HTTP/HTTPS connections supported, no raw WS
Browser supports Web Speech API	Safari/Firefox partial support

7. References

Google Cloud Speech-to-Text V2 Docs

Web Speech API Spec (MDN)

SignalR Documentation