speech_understanding_wiki - IRS-group/isr_tiago_docs GitHub Wiki

Whisper

Install

sudo apt install python3-pyaudio
pip3 install pvporcupine
pip3 install SpeechRecognition
pip3 install -U openai-whisper
pip3 install soundfile

Create an access key for porcupine by creating an account at https://console.picovoice.ai and save it in a file with the path isr_tiago/speech_recognition/mbot_speech_recognition/src/mbot_speech_recognition_ros/access_key.

Porcupine should work offline but needs connectivity every few weeks to verify authentication.

Launch and testing

Launch the speech recognition node with:

roslaunch mbot_speech_recognition mbot_speech_recognition.launch

To start listening for the keyword

rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_start'"

When you do this the node will record a 2 second sample to set a silence thereshold.

You can start listening and specify the asr using (default is whisper):

rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_start_google'"

To stop listening use:

rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_stop'"

If the node is listenning it will ignore all e_start messages until it receives a e_stop message. If you want to recalibrate the silence thereshold you should send a e_stop message and then a e_start message.

Results of ASR will be published here:

rostopic echo /mbot_speech_recognition/transcript

Semantic Similarity

Install

pip3 install sentence-transformers nltk torch torchvision torchaudio bllipparser
python3 -m nltk.downloader wordnet
python3 -m nltk.downloader bllip_wsj_no_aux

Sound sample recognition

Install

pip3 install numpy
pip3 install scipy

Sound sample creation

You can use any program to wav in speech_recognition/doorbell_recognition/src/doorbell_recognition_ros. The sample should be as short as possible for a faster detection.

Launch and testing

Launch the sound sample recognition node with:

roslaunch sound_sample_recognition sound_sample_recognition.launch

To start listeninng for the sound sample:

rostopic pub /sound_sample_recognition/event_in -1 std_msgs/String "data: 'e_start'"

To stop listening use:

rostopic pub /sound_sample_recognition/event_in -1 std_msgs/String "data: 'e_stop'"

The confidence level of sound sample detection will be published here:

rostopic echo /sound_sample_recognition/sound_sample_confidence

0.75 is a good thereshold for detection without false positives.

Install (OLD)

pip install tensorflow==1.13.1

pip install SpeechRecognition

pip install nltk==3.4.5

pip install scikit-learn

sudo apt install portaudio19-dev

pip install pyaudio

pip install gensim

pip install torch==1.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

pip install spacy

pip install ftfy==4.4.3

python -m spacy download en

pip install tensorflow-gpu==1.14.0

pip install bert-tensorflow==1.0.1

pip install bert-for-tf2

pip install --upgrade google-cloud-storage

if space and fifty do not work, try with: pip install pytorch-pretrained-bert

Setup (OLD)

import nltk

nltk.download('stopwords')

Go to https://console.cloud.google.com/apis/api/speech.googleapis.com/ and get JSON API

Add this JSON file to the folder: isr_tiago/speech_recognition/google_speech_recognition/credentials

Change the file to add your JSON credential isr_tiago/speech_recognition/google_speech_recognition/ros/config/config_google_speech_recognition.yaml

Replace the folder

isr_tiago/speech_recognition/mbot_dialogue_system/mbot_natural_language_understanding/mbot_nlu_pytorch/common/src/model

with the model folder inside the model_nlu.zip that is in the SocRobData google drive shared folder

Launch and Test (OLD)

For GPSR instead of mbot_nlu_pytorch.launch, run:

roslaunch mbot_nlu_pytorch mbot_nlu_pytorch_gpsr.launch

Sentence Split Node and NLU to Planning KnowledgeBase Interface

roslaunch gpsr_speech_processing gpsr_speech_processing.launch

Individual testing:

Google ASR

roslaunch mbot_speech_recognition mbot_speech_recognition.launch

Natural language understanding (NLU)

roslaunch mbot_nlu_pytorch mbot_nlu_pytorch.launch

The intention and facts are published here:

rostopic echo /nlu/dialogue_acts

To trigger the ASR provide a pre-recorded sound:

rosservice call /mbot_speech_recognition/recognize_speech "{audio_path: '/home/pal/SocRobData/AudioFiles/request_apple.wav', asr_method: 'google', time: 10.0, delete: false, timeout: 100.0}"

To trigger the ASR without a pre-recorded sound:

rosservice call /mbot_speech_recognition/recognize_speech "{audio_path: '', asr_method: 'google', time: 10.0, delete: true, timeout: 100.0}"