21 Whisper Server WSL - gloveboxes/OpenAI-Whisper-Transcriber-Sample GitHub Wiki

Windows WSL 2 with an NVidia GPU

The recommended configuration for running the OpenAI Whisper sample on Windows is with WSL 2 and an NVidia GPU. This configuration is popular and provides the best performance. The OpenAI Whisper speech to text transcription runs consistently faster on WSL 2 than natively on Windows.

Ideally, your system should have:

  1. Windows 11 with WSL 2 and Ubuntu 20.04 LTS.
  2. A modern CPU with 16 GB of RAM.
  3. An NVidia GPU with 10 to 12 GB of VRAM. But you can run smaller Whisper models on GPUs with less VRAM.

Update the NVidia drivers

Ensure the NVidia drivers are up to date. The NVidia drivers are installed in Windows. WSL includes a GPU driver that allows WSL to access the GPU, so don't install the NVidia drivers in WSL.

Install WSL 2

  1. Follow the instructions to install WSL.
  2. This sample was tested with Ubuntu 20.04 LTS running in WSL 2. You can download Ubuntu 20.04 LTS from the Microsoft Store.

Install software dependencies

  1. Update the Ubuntu system.
    1. From a WSL terminal.
    2. Run
      sudo apt update && sudo apt upgrade
    3. Restart WSL if necessary, from PowerShell, run wsl --shutdown.
  2. Install FFmpeg and pip3
    1. From a WSL terminal.
    2. Install FFmpeg and pip3. Run
      sudo apt install ffmpeg python3-pip
    3. Test FFmpeg. Run ffmpeg -version. The command should return the FFmpeg version.

Start the Whisper Transcriber Service

  1. From a WSL terminal.

  2. Clone the Whisper Transcriber Sample to your preferred repo folder.

    git clone https://github.com/gloveboxes/OpenAI-Whisper-Transcriber-Sample.git
  3. Navigate to the server folder.

    cd OpenAI-Whisper-Transcriber-Sample/server
  4. Install the required Python libraries.

    1. From a terminal window.

    2. Install the required Python libraries. Run

      pip3 install -r requirements.txt
  5. Test that CUDA/GPU is available to PyTorch.

    1. From a WSL terminal.

    2. Run the following command, if CUDA is available, the command will return True.

      python3 -c "import torch; print(torch.cuda.is_available())"
  6. Review the following chart is taken from the OpenAI Whisper Project Description page and select the model that will fit in the VRAM of your GPU. At the time of writing, Whisper multilingual models include tiny, small, medium, and large, and English-only models include tiny.en, small.en, and medium.en.

  7. Update the server/config.json file to set your desired Whisper model. For example, to use the medium model, set the model property to medium.

    { "model": "medium" }
  8. Start the Whisper Transcriber Service. The first time you run the service, it'll download the selected model. The download can take a few minutes, so depending on your internet speed, a timeout interval of 300 seconds is recommended.

    gunicorn --bind 0.0.0.0:5500 wsgi:app -t 300
  9. Once the Whisper Transcriber Service starts, you should see output similar to the following.

    [2023-06-04 18:53:46.194411] Whisper API Key: 17ce01e9-ac65-49c8-9cc9-18d8deb78197
    [2023-06-04 18:53:50.375244] Model: medium loaded.
    [2023-06-04 18:53:50.375565] Ready to transcribe audio files.
    
  10. Now, restart the Whisper Transcriber Service. The service will start much faster as the model is already downloaded without the timeout interval.

    gunicorn --bind 0.0.0.0:5500 wsgi:app
  11. The Whisper API Key will be also be displayed. Save the Whisper API Key somewhere safe, you'll need the key to configure the Whisper client.

    Whisper API Key: <key>
    

Next steps

Deploy the Whisper client

⚠️ **GitHub.com Fallback** ⚠️