speech_understanding_wiki - IRS-group/isr_tiago_docs GitHub Wiki
Whisper
Install
sudo apt install python3-pyaudio
pip3 install pvporcupine
pip3 install SpeechRecognition
pip3 install -U openai-whisper
pip3 install soundfile
Create an access key for porcupine by creating an account at https://console.picovoice.ai and save it in a file with the path isr_tiago/speech_recognition/mbot_speech_recognition/src/mbot_speech_recognition_ros/access_key.
Porcupine should work offline but needs connectivity every few weeks to verify authentication.
Launch and testing
Launch the speech recognition node with:
roslaunch mbot_speech_recognition mbot_speech_recognition.launch
To start listening for the keyword
rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_start'"
When you do this the node will record a 2 second sample to set a silence thereshold.
You can start listening and specify the asr using (default is whisper):
rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_start_google'"
To stop listening use:
rostopic pub /mbot_speech_recognition/event_in -1 std_msgs/String "data: 'e_stop'"
If the node is listenning it will ignore all e_start messages until it receives a e_stop message. If you want to recalibrate the silence thereshold you should send a e_stop message and then a e_start message.
Results of ASR will be published here:
rostopic echo /mbot_speech_recognition/transcript
Semantic Similarity
Install
pip3 install sentence-transformers nltk torch torchvision torchaudio bllipparser
python3 -m nltk.downloader wordnet
python3 -m nltk.downloader bllip_wsj_no_aux
Sound sample recognition
Install
pip3 install numpy
pip3 install scipy
Sound sample creation
You can use any program to wav in speech_recognition/doorbell_recognition/src/doorbell_recognition_ros. The sample should be as short as possible for a faster detection.
Launch and testing
Launch the sound sample recognition node with:
roslaunch sound_sample_recognition sound_sample_recognition.launch
To start listeninng for the sound sample:
rostopic pub /sound_sample_recognition/event_in -1 std_msgs/String "data: 'e_start'"
To stop listening use:
rostopic pub /sound_sample_recognition/event_in -1 std_msgs/String "data: 'e_stop'"
The confidence level of sound sample detection will be published here:
rostopic echo /sound_sample_recognition/sound_sample_confidence
0.75 is a good thereshold for detection without false positives.
Install (OLD)
pip install tensorflow==1.13.1
pip install SpeechRecognition
pip install nltk==3.4.5
pip install scikit-learn
sudo apt install portaudio19-dev
pip install pyaudio
pip install gensim
pip install torch==1.5.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy
pip install ftfy==4.4.3
python -m spacy download en
pip install tensorflow-gpu==1.14.0
pip install bert-tensorflow==1.0.1
pip install bert-for-tf2
pip install --upgrade google-cloud-storage
if space and fifty do not work, try with: pip install pytorch-pretrained-bert
Setup (OLD)
import nltk
nltk.download('stopwords')
Go to https://console.cloud.google.com/apis/api/speech.googleapis.com/ and get JSON API
Add this JSON file to the folder: isr_tiago/speech_recognition/google_speech_recognition/credentials
Change the file to add your JSON credential isr_tiago/speech_recognition/google_speech_recognition/ros/config/config_google_speech_recognition.yaml
Replace the folder
isr_tiago/speech_recognition/mbot_dialogue_system/mbot_natural_language_understanding/mbot_nlu_pytorch/common/src/model
with the model
folder inside the model_nlu.zip
that is in the SocRobData google drive shared folder
Launch and Test (OLD)
For GPSR instead of mbot_nlu_pytorch.launch, run:
roslaunch mbot_nlu_pytorch mbot_nlu_pytorch_gpsr.launch
Sentence Split Node and NLU to Planning KnowledgeBase Interface
roslaunch gpsr_speech_processing gpsr_speech_processing.launch
Individual testing:
Google ASR
roslaunch mbot_speech_recognition mbot_speech_recognition.launch
Natural language understanding (NLU)
roslaunch mbot_nlu_pytorch mbot_nlu_pytorch.launch
The intention and facts are published here:
rostopic echo /nlu/dialogue_acts
To trigger the ASR provide a pre-recorded sound:
rosservice call /mbot_speech_recognition/recognize_speech "{audio_path: '/home/pal/SocRobData/AudioFiles/request_apple.wav', asr_method: 'google', time: 10.0, delete: false, timeout: 100.0}"
To trigger the ASR without a pre-recorded sound:
rosservice call /mbot_speech_recognition/recognize_speech "{audio_path: '', asr_method: 'google', time: 10.0, delete: true, timeout: 100.0}"