Turn Management - isir/greta GitHub Wiki

This module controls turn taking behaviors based on audio-visual signals from both of the user and the agent. They are implemented with voice activity detection (VAD) model and voice activity projection (VAP) model.

There are two sub-modules are implemented in this module:

Requirement

  • We strongly recommend you to have nvidia GPU. You might be able to run the model without it but we assume it is extremely slow.
  • You need to install/setup LLM and ASR modules (e.g. API keys etc.).

Common installation

  • Install conda or anaconda from https://www.anaconda.com/
  • Install python3 (usually installed with anaconda but not for some reasons [e.g. Path to "python.exe" is not set globally])
  • You can test it by loading Greta - Microphone - backchannel.xml from Modular.jar. If it is correctly installed, Greta will do some nodding to your utterance.

Default server setup

  • Microphone stream server: TCP at port 9000 of localhost
  • Feedback server from Greta: TCP at port 5960
  • Main management server from Greta: TCP at port 5961
  • You can modify the microphone port number by replacing with your favorite number at the Microphone module in Modular.jar and press update button
  • When you modify port number of Microphone streaming server in Modular.jar, you also need to update it in this module as well