Greta 2.0: LLM, ASR, TurnManager, Facial expression generator - isir/greta GitHub Wiki

Configuration example with ASR, LLM, VAP-based turn management, and MODIFF facial expression generator.

If the following screenshots are difficult to view, please try opening them in a new tab by right-clicking on them.

Installation

Please complete the installation steps for the components.

Prerequisite

You should complete beforehand: Quick start
Two webcams for the VAP-based turn-taking module and the MODIFF module
One shared microphone for the VAP-based turn-management module, DeepGramContinuous ASR module
Get and set API keys into each api_key.txt for DeepGram for ASR module, Mistral for LLM module, and Mistral for MODIFF module

Usage

Case 1: Use the existing configuration file

Step 1: Grata platform: common

Launch Modular.jar
Go to "File" tab, "Open" and choose "Greta_2_0.xml" configuration file (bin/Configurations/Greta/Advanced/Greta_2_0.xml).

You may need to wait several minutes for the first launch, as Python conda environments are being created.

Once all the system start-up is complete, you'll see the cockpit as shown in the screenshot

Recommended to use the largest screen possible.

Step 2: OpenFace 2.0

Open OpenFaceOfflineZeroMQ.exe
Go to "Record" tab, enable "Broadcast with ZeroMQ (port 5000)"
Go to the "File" tab, open the webcam
Once you see the analyzed facial features on the screen, you are ready to connect to the Greta platform.

Step 3: Grata platform: MODIFF

In the "OpenFace2 OutputStream Reader" window, in the "Stream to read" section, select the "ZeroMQ" tab and click the "Connect" button (Host: localhost, Port: 5000).
Once you receive the AU features from OpenFace, you can see the received feature list in the bottom section.
Click the "Select all" button, and then "Set".
In the "MODIFF" tab, enable the "Launch" checkbox and wait until the prompt screen displays that the model is waiting for connections.
Enable the "Connect" checkbox.
Check the "Enable" checkbox in the "Socket parameters" section in the "MODIFF" window.
Enable the "Perform" checkbox at the bottom.

Step 4: Greta platform: DeepGramContinuous

Check the "Enable" checkbox
Click the "Listen" button

Step 5: Greta platform: MICounselorIncreametalDA (Motivational Interview Counselor Incremental with Dialog Act)

Select one of the themes from the candidates.
In the "Socket Parameters" section, select "Online" for the "Model" parameter
Check the "Enable" checkbox

Step 6: (Optional) Microsaccade / Body Head Noise / and Blink

You can add these modules to realize human-like unconscious behaviors.

Microsaccade: check the "enable" checkbox
Body Head Noise: check the "Enable" checkboxes (you can modify the intensity values if you want)
Blink: check here

For more information, please check Idle behavior

Step 7: Greta platform: TurnManagementContnuous

Select one of the models from the candidates at the bottom
If you want to use the "audio/face embed VAP" model, please follow the following steps.

In addition to one microphone, you need one webcam if you want to use the "audio/face embed VAP" model

In the "Capture Controller", enable the "Real Time" checkbox and the "fixed index" checkbox

Click the "Video" button

Enable the "Activate" checkbox in the "TurnManagementContnuous" window.
Wait for a while, until you observe "[TurnManagement Greta] Generator started" in the output log in Netbeans.

If you select the "audio/faceEmbed VAP" model, you can also observe clipped faces of the user and the agent for inspection.

You are now ready to speak with the agent. (You can start first.)

Since the model is a VAP-based neural network, the prediction is not always perfect. You may need to repeat the word sometimes.

The VAP model based on audio and face image embedding requires more time to start up, so wait for a while

Case 2: Create the configuration by yourself

Step 1: Launch Modular.jar

Open Modular.jar from Netbeans.

Step 2: Add the following modules to Modular.jar

Common

ActiveMQ Broker from [Add -> Network Connections -> ActiveMQ -> Broker]

Feedbacks module from [Add -> Feedbacks -> Feedbacks]

Microphone module from [Add -> Input -> Microphone]

(Optional) Microsaccade module from [Add -> Inputs -> Microsaccade]

(Optional) Body Head Noise module from [Add -> Inputs -> Body Head Noise]

DeepGramContinuous ASR

DeepGramContinuous module from [Add -> Dialogue -> DeepASR -> DeepGramContinuous]

MICounselorIncrementalDA

MICounselorIncrementalDA module from [Add -> Dialogue -> LLM -> MICounselorIncrementalDA]

TurnManagement / TurnManagementContinuous

Capture Controller module from [Add -> Player -> Capture Controller]

TurnManagementContinuous module from [Add -> Dialogue -> TurnManagement -> TurnManagementContinuous]

TurnManagement module (for VAD-based backchannel) from [Add -> Dialogue -> TurnManagement -> TurnManagement]

MODIFF

MODIFF module from [Add -> Inputs -> MODIFF]

OpenFace2 Output Stream Reader from [Add -> Inputs -> OpenFace -> (OpenFace v2) Output Stream Reader]

Simple AU Performer from [Add -> AU Performers -> Simple AU Performer]

Face Blender from [Add -> Animation filters -> Face -> Face Blender]

Lip Blender from [Add -> Animation filters -> Face -> Lip Blender]

Step 3: Create the following connections in Modular.jar

You may skip some connections if you don't need all the functionalities.

Common

Behavior Realizer -> Feedbacks
(Optional) Microsaccade -> Behavior Realizer
(Optional) Body Head Noise -> MPEG4 Animatable

TurnManagement / TurnManagementContinuous / DeepGramContinuous / MICounselorIncrementalDA

Capture Controller -> Ogre Player
Feedbacks -> TurnManagement
Feedbacks -> TurnManagementContinuous
Feedbacks -> DeepGramContinuous
DeepGramContinuous -> TurnManagementContinuous
TurnManagement -> BehaviorPlanner
TurnManagementContinuous -> MICounselorIncrementalDA

MODIFF

MODIFF -> Simple AU Performer
Simple AU Performer -> Face Blender (FAPFrameEmitterToFAPFramePerformer)
AU to FAP (this could be another "Simple AU Performer" module) -> Face Blender (FaceSourceToFaceBlender)
Face Blender -> Lip Blender (FAPFrameEmitterToFAPFramePerformer)
Lip Model -> Lip Blender (LipSourceForLipBlender)
Lip Blender -> MPEG4 Animatable

Step 4: Launch the system

Please follow the instructions from Case 1, Steps 2 through 7.

License for pretrained models for the turn management module

A pre-trained CPC model, located at encoders/cpc/60k_epoch4-d0f474de.pt, is from the original CPC project and please follow its specific license. Refer to the original repository at (https://github.com/facebookresearch/CPC_audio) for more details.
A pre-trained FormerDFER model, located at encoders/FormerDFER/DFER_encoder_weight_only.pt, is the simplified (from model_set_1.pt, eliminated temporal transformer and linear layer) version of the original pre-trained model from the original Former-DFER project. Please follow its specific license. Refer to the original repository at (https://github.com/zengqunzhao/Former-DFER) for more details.