Multimodal - aplpolaris/promptfx GitHub Wiki

Audio Views

The Speech-to-Text view provides audio transcription using OpenAI's Whisper model. This view is also provided under the API Tab.

To use the view, click "Record" and speak. Then click "Stop" to save the audio file and "Run" to push the audio file to Whisper for transcription.

image

Image Views

The Text-to-Image view supports generating images using OpenAI's DALL-E models. This view is also provided under the API Tab.