Turn Management - isir/greta GitHub Wiki
This module controls turn taking behaviors based on audio-visual signals from both of the user and the agent. They are implemented with voice activity detection (VAD) model and voice activity projection (VAP) model.
There are two sub-modules are implemented in this module:
Requirement
- We strongly recommend you to have nvidia GPU. You might be able to run the model without it but we assume it is extremely slow.
- You need to install/setup LLM and ASR modules (e.g. API keys etc.).
Common installation
- Install conda or anaconda from https://www.anaconda.com/
- Install python3 (usually installed with anaconda but not for some reasons [e.g. Path to "python.exe" is not set globally])
- You can test it by loading Greta - Microphone - backchannel.xml from Modular.jar. If it is correctly installed, Greta will do some nodding to your utterance.
Default server setup
- Microphone stream server: TCP at port 9000 of localhost
- Feedback server from Greta: TCP at port 5960
- Main management server from Greta: TCP at port 5961
- You can modify the microphone port number by replacing with your favorite number at the Microphone module in Modular.jar and press update button
- When you modify port number of Microphone streaming server in Modular.jar, you also need to update it in this module as well