Turn Management - isir/greta GitHub Wiki

This module controls turn taking behaviors based on audio-visual signals from both of the user and the agent. They are implemented with voice activity detection (VAD) model and voice activity projection (VAP) model.

There are two sub-modules are implemented in this module:

Requirement

We strongly recommend you to have nvidia GPU. You might be able to run the model without it but we assume it is extremely slow.
You need to install/setup LLM and ASR modules (e.g. API keys etc.).

ASR: https://github.com/isir/greta/wiki/DeepGram

LLM: https://github.com/isir/greta/wiki/Mistral-incremental

Common installation

Install conda or anaconda from https://www.anaconda.com/
Install python3 (usually installed with anaconda but not for some reasons [e.g. Path to "python.exe" is not set globally])
You can test it by loading Greta - Microphone - backchannel.xml from Modular.jar. If it is correctly installed, Greta will do some nodding to your utterance.

Default server setup

Microphone stream server: TCP at port 9000 of localhost
Feedback server from Greta: TCP at port 5960
Main management server from Greta: TCP at port 5961
You can modify the microphone port number by replacing with your favorite number at the Microphone module in Modular.jar and press update button
When you modify port number of Microphone streaming server in Modular.jar, you also need to update it in this module as well