Multimodal - aplpolaris/promptfx GitHub Wiki
Audio Views
The Speech-to-Text view provides audio transcription using OpenAI's Whisper model. This view is also provided under the API Tab.
To use the view, click "Record" and speak. Then click "Stop" to save the audio file and "Run" to push the audio file to Whisper for transcription.
Image Views
The Text-to-Image view supports generating images using OpenAI's DALL-E models. This view is also provided under the API Tab.