Greta 2.0: LLM, ASR, TurnManager, Facial expression generator - isir/greta GitHub Wiki
Configuration example with ASR, LLM, VAP-based turn management, and MODIFF facial expression generator.
If the following screenshots are difficult to view, please try opening them in a new tab by right-clicking on them.
Installation
Please complete the installation steps for the components.
Prerequisite
- You should complete beforehand: Quick start
 - Two webcams for the VAP-based turn-taking module and the MODIFF module
 - One shared microphone for the VAP-based turn-management module, DeepGramContinuous ASR module
 - Get and set API keys into each api_key.txt for DeepGram for ASR module, Mistral for LLM module, and Mistral for MODIFF module
 
Usage
Case 1: Use the existing configuration file
Step 1: Grata platform: common
- Launch Modular.jar
 - Go to "File" tab, "Open" and choose "Greta_2_0.xml" configuration file (bin/Configurations/Greta/Advanced/Greta_2_0.xml).
 
- You may need to wait several minutes for the first launch, as Python conda environments are being created.
 
- Once all the system start-up is complete, you'll see the cockpit as shown in the screenshot
 
- Recommended to use the largest screen possible.
 
Step 2: OpenFace 2.0
- Open OpenFaceOfflineZeroMQ.exe
 - Go to "Record" tab, enable "Broadcast with ZeroMQ (port 5000)"
 - Go to the "File" tab, open the webcam
 - Once you see the analyzed facial features on the screen, you are ready to connect to the Greta platform.
 
Step 3: Grata platform: MODIFF
- In the "OpenFace2 OutputStream Reader" window, in the "Stream to read" section, select the "ZeroMQ" tab and click the "Connect" button (Host: localhost, Port: 5000).
 - Once you receive the AU features from OpenFace, you can see the received feature list in the bottom section.
 - Click the "Select all" button, and then "Set".
 - In the "MODIFF" tab, enable the "Launch" checkbox and wait until the prompt screen displays that the model is waiting for connections.
 - Enable the "Connect" checkbox.
 - Check the "Enable" checkbox in the "Socket parameters" section in the "MODIFF" window.
 - Enable the "Perform" checkbox at the bottom.
 
Step 4: Greta platform: DeepGramContinuous
- Check the "Enable" checkbox
 - Click the "Listen" button
 
Step 5: Greta platform: MICounselorIncreametalDA (Motivational Interview Counselor Incremental with Dialog Act)
- Select one of the themes from the candidates.
 - In the "Socket Parameters" section, select "Online" for the "Model" parameter
 - Check the "Enable" checkbox
 
Step 6: (Optional) Microsaccade / Body Head Noise / and Blink
You can add these modules to realize human-like unconscious behaviors.
- Microsaccade: check the "enable" checkbox
 - Body Head Noise: check the "Enable" checkboxes (you can modify the intensity values if you want)
 - Blink: check here
 
For more information, please check Idle behavior
Step 7: Greta platform: TurnManagementContnuous
- Select one of the models from the candidates at the bottom
 - If you want to use the "audio/face embed VAP" model, please follow the following steps.
 
- In addition to one microphone, you need one webcam if you want to use the "audio/face embed VAP" model
 - In the "Capture Controller", enable the "Real Time" checkbox and the "fixed index" checkbox
 - Click the "Video" button
 
- Enable the "Activate" checkbox in the "TurnManagementContnuous" window.
 - Wait for a while, until you observe "[TurnManagement Greta] Generator started" in the output log in Netbeans.
 
- If you select the "audio/faceEmbed VAP" model, you can also observe clipped faces of the user and the agent for inspection.
 
- You are now ready to speak with the agent. (You can start first.)
 
- Since the model is a VAP-based neural network, the prediction is not always perfect. You may need to repeat the word sometimes.
 - The VAP model based on audio and face image embedding requires more time to start up, so wait for a while
 
Case 2: Create the configuration by yourself
Step 1: Launch Modular.jar
Open Modular.jar from Netbeans.
Step 2: Add the following modules to Modular.jar
- Common
 
- ActiveMQ Broker from [Add -> Network Connections -> ActiveMQ -> Broker]
 - Feedbacks module from [Add -> Feedbacks -> Feedbacks]
 - Microphone module from [Add -> Input -> Microphone]
 - (Optional) Microsaccade module from [Add -> Inputs -> Microsaccade]
 - (Optional) Body Head Noise module from [Add -> Inputs -> Body Head Noise]
 
- DeepGramContinuous ASR
 
- DeepGramContinuous module from [Add -> Dialogue -> DeepASR -> DeepGramContinuous]
 
- MICounselorIncrementalDA
 
- MICounselorIncrementalDA module from [Add -> Dialogue -> LLM -> MICounselorIncrementalDA]
 
- TurnManagement / TurnManagementContinuous
 
- Capture Controller module from [Add -> Player -> Capture Controller]
 - TurnManagementContinuous module from [Add -> Dialogue -> TurnManagement -> TurnManagementContinuous]
 - TurnManagement module (for VAD-based backchannel) from [Add -> Dialogue -> TurnManagement -> TurnManagement]
 
- MODIFF
 
- MODIFF module from [Add -> Inputs -> MODIFF]
 - OpenFace2 Output Stream Reader from [Add -> Inputs -> OpenFace -> (OpenFace v2) Output Stream Reader]
 - Simple AU Performer from [Add -> AU Performers -> Simple AU Performer]
 - Face Blender from [Add -> Animation filters -> Face -> Face Blender]
 - Lip Blender from [Add -> Animation filters -> Face -> Lip Blender]
 
Step 3: Create the following connections in Modular.jar
You may skip some connections if you don't need all the functionalities.
- Common
 
- Behavior Realizer -> Feedbacks
 - (Optional) Microsaccade -> Behavior Realizer
 - (Optional) Body Head Noise -> MPEG4 Animatable
 
- TurnManagement / TurnManagementContinuous / DeepGramContinuous / MICounselorIncrementalDA
 
- Capture Controller -> Ogre Player
 - Feedbacks -> TurnManagement
 - Feedbacks -> TurnManagementContinuous
 - Feedbacks -> DeepGramContinuous
 - DeepGramContinuous -> TurnManagementContinuous
 - TurnManagement -> BehaviorPlanner
 - TurnManagementContinuous -> MICounselorIncrementalDA
 
- MODIFF
 
- MODIFF -> Simple AU Performer
 - Simple AU Performer -> Face Blender (FAPFrameEmitterToFAPFramePerformer)
 - AU to FAP (this could be another "Simple AU Performer" module) -> Face Blender (FaceSourceToFaceBlender)
 - Face Blender -> Lip Blender (FAPFrameEmitterToFAPFramePerformer)
 - Lip Model -> Lip Blender (LipSourceForLipBlender)
 - Lip Blender -> MPEG4 Animatable
 
Step 4: Launch the system
Please follow the instructions from Case 1, Steps 2 through 7.
License for pretrained models for the turn management module
- A pre-trained CPC model, located at encoders/cpc/60k_epoch4-d0f474de.pt, is from the original CPC project and please follow its specific license. Refer to the original repository at (https://github.com/facebookresearch/CPC_audio) for more details.
 - A pre-trained FormerDFER model, located at encoders/FormerDFER/DFER_encoder_weight_only.pt, is the simplified (from model_set_1.pt, eliminated temporal transformer and linear layer) version of the original pre-trained model from the original Former-DFER project. Please follow its specific license. Refer to the original repository at (https://github.com/zengqunzhao/Former-DFER) for more details.