Speech Requirements - RoBorregos/robocup-home GitHub Wiki

Speech: Requirements

Installation Requirements

General

audio_common (ROS)
portaudio19-dev
pyaudio, pynput

RNNoise The version that is being used is here. It is automatically downloaded, built, and added to the code by cmake.

Linux
autoconfig
ar
make

DeepSpeech2 The one used is the implementation in PaddlePaddle by Baidu. It is a specific forked version here that was copied into the repository. Also, because some problems, there is a specific version needed of PaddlePaddle and no later version of TensorFlow can be there. An example of installing the dependencies is here.

python2
paddlepaddle==1.2.1
pkg-config, libflac-dev, libogg-dev, libvorbis-dev, libboost-dev, swig
scipy, resampy, SoundFile, python_speech_features
swig_decoders (A library built by DeepSpeech when installing it; remember to clean the directory after finishing installing it)
tensorflow==1.12 (not required, but to note that all this breaks later TF versions)

Models and Data

DeepSpeech2 There are two models needed: the speech model and the language model; also some warm up data. The two models can be downloaded from the fork's releases and should be put inside DeepSpeech/models/ folder. The instructions for how to download the warm up data and where to put it and the models, can be found in ros_server.py

Azure SpeechToText API The node uses a file with the API-key and the Azure region. The file should be at action_selectors/src/GLOBAL.txt and an example is at action_selectors/src/_GLOBAL_.txt.