Greta 2.0: LLM, ASR, TurnManager, Facial expression generator - isir/greta GitHub Wiki

Configuration example with ASR, LLM, VAP-based turn management, and MODIFF facial expression generator.

If the following screenshots are difficult to view, please try opening them in a new tab by right-clicking on them.

Installation

Please complete the installation steps for the components.

Prerequisite

  • You should complete beforehand: Quick start
  • Two webcams for the VAP-based turn-taking module and the MODIFF module
  • One shared microphone for the VAP-based turn-management module, DeepGramContinuous ASR module
  • Get and set API keys into each api_key.txt for DeepGram for ASR module, Mistral for LLM module, and Mistral for MODIFF module

Usage

Case 1: Use the existing configuration file

Step 1: Grata platform: common

  1. Launch Modular.jar
  2. Go to "File" tab, "Open" and choose "Greta_2_0.xml" configuration file (bin/Configurations/Greta/Advanced/Greta_2_0.xml).
  • You may need to wait several minutes for the first launch, as Python conda environments are being created.
  1. Once all the system start-up is complete, you'll see the cockpit as shown in the screenshot
  • Recommended to use the largest screen possible.

Step 2: OpenFace 2.0

  1. Open OpenFaceOfflineZeroMQ.exe
  2. Go to "Record" tab, enable "Broadcast with ZeroMQ (port 5000)"
  3. Go to the "File" tab, open the webcam
  4. Once you see the analyzed facial features on the screen, you are ready to connect to the Greta platform.

Step 3: Grata platform: MODIFF

  1. In the "OpenFace2 OutputStream Reader" window, in the "Stream to read" section, select the "ZeroMQ" tab and click the "Connect" button (Host: localhost, Port: 5000).
  2. Once you receive the AU features from OpenFace, you can see the received feature list in the bottom section.
  3. Click the "Select all" button, and then "Set".
  4. In the "MODIFF" tab, enable the "Launch" checkbox and wait until the prompt screen displays that the model is waiting for connections.
  5. Enable the "Connect" checkbox.
  6. Check the "Enable" checkbox in the "Socket parameters" section in the "MODIFF" window.
  7. Enable the "Perform" checkbox at the bottom.

Step 4: Greta platform: DeepGramContinuous

  1. Check the "Enable" checkbox
  2. Click the "Listen" button

Step 5: Greta platform: MICounselorIncreametalDA (Motivational Interview Counselor Incremental with Dialog Act)

  1. Select one of the themes from the candidates.
  2. In the "Socket Parameters" section, select "Online" for the "Model" parameter
  3. Check the "Enable" checkbox

Step 6: (Optional) Microsaccade / Body Head Noise / and Blink

You can add these modules to realize human-like unconscious behaviors.

  • Microsaccade: check the "enable" checkbox
  • Body Head Noise: check the "Enable" checkboxes (you can modify the intensity values if you want)
  • Blink: check here

For more information, please check Idle behavior

Step 7: Greta platform: TurnManagementContnuous

  1. Select one of the models from the candidates at the bottom
  2. If you want to use the "audio/face embed VAP" model, please follow the following steps.
  • In addition to one microphone, you need one webcam if you want to use the "audio/face embed VAP" model
  • In the "Capture Controller", enable the "Real Time" checkbox and the "fixed index" checkbox
  • Click the "Video" button
  1. Enable the "Activate" checkbox in the "TurnManagementContnuous" window.
  2. Wait for a while, until you observe "[TurnManagement Greta] Generator started" in the output log in Netbeans.
  • If you select the "audio/faceEmbed VAP" model, you can also observe clipped faces of the user and the agent for inspection.
  1. You are now ready to speak with the agent. (You can start first.)
  • Since the model is a VAP-based neural network, the prediction is not always perfect. You may need to repeat the word sometimes.
  • The VAP model based on audio and face image embedding requires more time to start up, so wait for a while

Case 2: Create the configuration by yourself

Step 1: Launch Modular.jar

Open Modular.jar from Netbeans.

Step 2: Add the following modules to Modular.jar

  1. Common
  • ActiveMQ Broker from [Add -> Network Connections -> ActiveMQ -> Broker]
  • Feedbacks module from [Add -> Feedbacks -> Feedbacks]
  • Microphone module from [Add -> Input -> Microphone]
  • (Optional) Microsaccade module from [Add -> Inputs -> Microsaccade]
  • (Optional) Body Head Noise module from [Add -> Inputs -> Body Head Noise]
  1. DeepGramContinuous ASR
  • DeepGramContinuous module from [Add -> Dialogue -> DeepASR -> DeepGramContinuous]
  1. MICounselorIncrementalDA
  • MICounselorIncrementalDA module from [Add -> Dialogue -> LLM -> MICounselorIncrementalDA]
  1. TurnManagement / TurnManagementContinuous
  • Capture Controller module from [Add -> Player -> Capture Controller]
  • TurnManagementContinuous module from [Add -> Dialogue -> TurnManagement -> TurnManagementContinuous]
  • TurnManagement module (for VAD-based backchannel) from [Add -> Dialogue -> TurnManagement -> TurnManagement]
  1. MODIFF
  • MODIFF module from [Add -> Inputs -> MODIFF]
  • OpenFace2 Output Stream Reader from [Add -> Inputs -> OpenFace -> (OpenFace v2) Output Stream Reader]
  • Simple AU Performer from [Add -> AU Performers -> Simple AU Performer]
  • Face Blender from [Add -> Animation filters -> Face -> Face Blender]
  • Lip Blender from [Add -> Animation filters -> Face -> Lip Blender]

Step 3: Create the following connections in Modular.jar

You may skip some connections if you don't need all the functionalities.

  1. Common
  • Behavior Realizer -> Feedbacks
  • (Optional) Microsaccade -> Behavior Realizer
  • (Optional) Body Head Noise -> MPEG4 Animatable
  1. TurnManagement / TurnManagementContinuous / DeepGramContinuous / MICounselorIncrementalDA
  • Capture Controller -> Ogre Player
  • Feedbacks -> TurnManagement
  • Feedbacks -> TurnManagementContinuous
  • Feedbacks -> DeepGramContinuous
  • DeepGramContinuous -> TurnManagementContinuous
  • TurnManagement -> BehaviorPlanner
  • TurnManagementContinuous -> MICounselorIncrementalDA
  1. MODIFF
  • MODIFF -> Simple AU Performer
  • Simple AU Performer -> Face Blender (FAPFrameEmitterToFAPFramePerformer)
  • AU to FAP (this could be another "Simple AU Performer" module) -> Face Blender (FaceSourceToFaceBlender)
  • Face Blender -> Lip Blender (FAPFrameEmitterToFAPFramePerformer)
  • Lip Model -> Lip Blender (LipSourceForLipBlender)
  • Lip Blender -> MPEG4 Animatable

Step 4: Launch the system

Please follow the instructions from Case 1, Steps 2 through 7.

License for pretrained models for the turn management module

  • A pre-trained CPC model, located at encoders/cpc/60k_epoch4-d0f474de.pt, is from the original CPC project and please follow its specific license. Refer to the original repository at (https://github.com/facebookresearch/CPC_audio) for more details.
  • A pre-trained FormerDFER model, located at encoders/FormerDFER/DFER_encoder_weight_only.pt, is the simplified (from model_set_1.pt, eliminated temporal transformer and linear layer) version of the original pre-trained model from the original Former-DFER project. Please follow its specific license. Refer to the original repository at (https://github.com/zengqunzhao/Former-DFER) for more details.