Home - gloveboxes/OpenAI-Whisper-Transcriber-Sample GitHub Wiki

OpenAI Whisper Transcriber Sample

OpenAI Whisper is a speech-to-text transcription library that uses the OpenAI Whisper models.

Welcome to the OpenAI Whisper Transcriber Sample. This sample demonstrates how to use the openai-whisper library to transcribe audio files.

Follow the deployment and run instructions in the right-hand navigation panel to deploy the sample.

What is OpenAI Whisper?

The OpenAI Whisper model is an Open Source speech-to-text transcription model that is trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

OpenAI describes Whisper as anencoder-decoder transformer, a type of neural network that can use context gleaned from input data to learn associations that can then be translated into the model's output.

Quotes from the OpenAI Whisper webpage:

We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

Running OpenAI Whisper Sample

The Whisper model runs best on an NVidia GPU from WSL2 or Linux. The sample code will run on a CPU, both Intel and Apple Silicon are supported, but transcription will be slower. If you are running the model on a CPU then it's recommended to use smaller Whisper models for the transcriptions.

Solution Architecture

The solution is divided into two parts:

A Whisper service, that wraps the openai-whisper library and loads the Whisper model, and exposes the model as a REST API.
A Whisper client, that calls the Whisper service to transcribe audio files. There are two clients:
1. A GUI client that runs on Windows, macOS, and Linux.
2. A Web client.

The advantage of this architecture is the model is loaded once by the Whisper service, a relatively time-consuming process, and then called multiple times by the Whisper clients.

Next deploy the Whisper server

Deploy the Whisper server