Greta 2.0: LLM, ASR, TurnManager, Facial expression generator - isir/greta GitHub Wiki
Configuration example with ASR, LLM, VAP-based turn management, and MODIFF facial expression generator.
If the following screenshots are difficult to view, please try opening them in a new tab by right-clicking on them.
Installation
Please complete the installation steps for the components.
Prerequisite
- You should complete beforehand: Quick start
- Two webcams for the VAP-based turn-taking module and the MODIFF module
- One shared microphone for the VAP-based turn-management module, DeepGramContinuous ASR module
- Get and set API keys into each api_key.txt for DeepGram for ASR module, Mistral for LLM module, and Mistral for MODIFF module
Usage
Case 1: Use the existing configuration file
Step 1: Grata platform: common
- Launch Modular.jar
- Go to "File" tab, "Open" and choose "Greta_2_0.xml" configuration file (bin/Configurations/Greta/Advanced/Greta_2_0.xml).
- You may need to wait several minutes for the first launch, as Python conda environments are being created.
- Once all the system start-up is complete, you'll see the cockpit as shown in the screenshot
- Recommended to use the largest screen possible.
Step 2: OpenFace 2.0
- Open OpenFaceOfflineZeroMQ.exe
- Go to "Record" tab, enable "Broadcast with ZeroMQ (port 5000)"
- Go to the "File" tab, open the webcam
- Once you see the analyzed facial features on the screen, you are ready to connect to the Greta platform.
Step 3: Grata platform: MODIFF
- In the "OpenFace2 OutputStream Reader" window, in the "Stream to read" section, select the "ZeroMQ" tab and click the "Connect" button (Host: localhost, Port: 5000).
- Once you receive the AU features from OpenFace, you can see the received feature list in the bottom section.
- Click the "Select all" button, and then "Set".
- In the "MODIFF" tab, enable the "Launch" checkbox and wait until the prompt screen displays that the model is waiting for connections.
- Enable the "Connect" checkbox.
- Check the "Enable" checkbox in the "Socket parameters" section in the "MODIFF" window.
- Enable the "Perform" checkbox at the bottom.
Step 4: Greta platform: DeepGramContinuous
- Check the "Enable" checkbox
- Click the "Listen" button
Step 5: Greta platform: MICounselorIncreametalDA (Motivational Interview Counselor Incremental with Dialog Act)
- Select one of the themes from the candidates.
- In the "Socket Parameters" section, select "Online" for the "Model" parameter
- Check the "Enable" checkbox
Step 6: (Optional) Microsaccade / Body Head Noise / and Blink
You can add these modules to realize human-like unconscious behaviors.
- Microsaccade: check the "enable" checkbox
- Body Head Noise: check the "Enable" checkboxes (you can modify the intensity values if you want)
- Blink: check here
For more information, please check Idle behavior
Step 7: Greta platform: TurnManagementContnuous
- Select one of the models from the candidates at the bottom
- If you want to use the "audio/face embed VAP" model, please follow the following steps.
- In addition to one microphone, you need one webcam if you want to use the "audio/face embed VAP" model
- In the "Capture Controller", enable the "Real Time" checkbox and the "fixed index" checkbox
- Click the "Video" button
- Enable the "Activate" checkbox in the "TurnManagementContnuous" window.
- Wait for a while, until you observe "[TurnManagement Greta] Generator started" in the output log in Netbeans.
- If you select the "audio/faceEmbed VAP" model, you can also observe clipped faces of the user and the agent for inspection.
- You are now ready to speak with the agent. (You can start first.)
- Since the model is a VAP-based neural network, the prediction is not always perfect. You may need to repeat the word sometimes.
- The VAP model based on audio and face image embedding requires more time to start up, so wait for a while
Case 2: Create the configuration by yourself
Step 1: Launch Modular.jar
Open Modular.jar from Netbeans.
Step 2: Add the following modules to Modular.jar
- Common
- ActiveMQ Broker from [Add -> Network Connections -> ActiveMQ -> Broker]
- Feedbacks module from [Add -> Feedbacks -> Feedbacks]
- Microphone module from [Add -> Input -> Microphone]
- (Optional) Microsaccade module from [Add -> Inputs -> Microsaccade]
- (Optional) Body Head Noise module from [Add -> Inputs -> Body Head Noise]
- DeepGramContinuous ASR
- DeepGramContinuous module from [Add -> Dialogue -> DeepASR -> DeepGramContinuous]
- MICounselorIncrementalDA
- MICounselorIncrementalDA module from [Add -> Dialogue -> LLM -> MICounselorIncrementalDA]
- TurnManagement / TurnManagementContinuous
- Capture Controller module from [Add -> Player -> Capture Controller]
- TurnManagementContinuous module from [Add -> Dialogue -> TurnManagement -> TurnManagementContinuous]
- TurnManagement module (for VAD-based backchannel) from [Add -> Dialogue -> TurnManagement -> TurnManagement]
- MODIFF
- MODIFF module from [Add -> Inputs -> MODIFF]
- OpenFace2 Output Stream Reader from [Add -> Inputs -> OpenFace -> (OpenFace v2) Output Stream Reader]
- Simple AU Performer from [Add -> AU Performers -> Simple AU Performer]
- Face Blender from [Add -> Animation filters -> Face -> Face Blender]
- Lip Blender from [Add -> Animation filters -> Face -> Lip Blender]
Step 3: Create the following connections in Modular.jar
You may skip some connections if you don't need all the functionalities.
- Common
- Behavior Realizer -> Feedbacks
- (Optional) Microsaccade -> Behavior Realizer
- (Optional) Body Head Noise -> MPEG4 Animatable
- TurnManagement / TurnManagementContinuous / DeepGramContinuous / MICounselorIncrementalDA
- Capture Controller -> Ogre Player
- Feedbacks -> TurnManagement
- Feedbacks -> TurnManagementContinuous
- Feedbacks -> DeepGramContinuous
- DeepGramContinuous -> TurnManagementContinuous
- TurnManagement -> BehaviorPlanner
- TurnManagementContinuous -> MICounselorIncrementalDA
- MODIFF
- MODIFF -> Simple AU Performer
- Simple AU Performer -> Face Blender (FAPFrameEmitterToFAPFramePerformer)
- AU to FAP (this could be another "Simple AU Performer" module) -> Face Blender (FaceSourceToFaceBlender)
- Face Blender -> Lip Blender (FAPFrameEmitterToFAPFramePerformer)
- Lip Model -> Lip Blender (LipSourceForLipBlender)
- Lip Blender -> MPEG4 Animatable
Step 4: Launch the system
Please follow the instructions from Case 1, Steps 2 through 7.
License for pretrained models for the turn management module
- A pre-trained CPC model, located at encoders/cpc/60k_epoch4-d0f474de.pt, is from the original CPC project and please follow its specific license. Refer to the original repository at (https://github.com/facebookresearch/CPC_audio) for more details.
- A pre-trained FormerDFER model, located at encoders/FormerDFER/DFER_encoder_weight_only.pt, is the simplified (from model_set_1.pt, eliminated temporal transformer and linear layer) version of the original pre-trained model from the original Former-DFER project. Please follow its specific license. Refer to the original repository at (https://github.com/zengqunzhao/Former-DFER) for more details.