APIs - thuiar/MMSA-FET GitHub Wiki
1. Customize Configs
The get_default_config function returns configs in python dictionaries. It takes only one argument which is the name of the extraction method.
Supported methods are: 'bert', 'glove', 'librosa', 'mediapipe', 'openface', 'opensmile', 'roberta', 'wav2vec', 'aligned'.
Example code:
from MSA_FET import get_default_config, FeatureExtractionTool
# Get default config for OpenFace & alter
config_v = get_default_config('openface')
# Enable Active Speaker Detection
config_v['video']['multiFace']['enable'] = True
# Get default config for openSMILE & alter
config_a = get_default_config('opensmile')
# Use LLD features
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'
# Get default config for bert & alter
config_t = get_default_config('bert')
# Switch to Chinese
config_t['text']['pretrained'] = 'bert-base-chinese'
# Combine the three modalities
config = {**config_a, **config_v, **config_t}
# Initialize main class
fet = FeatureExtractionTool(config)
2. Initialize Main Class
The FeatureExtractionTool class is the main class of this toolkit. The initialization function takes in 5 arguments:
config(Required): Python dictionary or path to a JSON file or name of an example config.dataset_root_dir: Path to datasets parent directory. Used when extracting dataset features withdataset_name.tmp_dir: Temporary directory path. Default:'~/.MSA-FET/tmp'.log_dir: Log directory path. Default:'~/.MSA-FET/log'.verbose: Verbose level of stdout.0for error,1for info,2for debug. Default:1.
Example code:
from MSA_FET import FeatureExtractionTool
# Initialize with example config & change temp dir
fet = FeatureExtractionTool(config="librosa", tmp_dir="/tmp")
# Initialize with custom_config.json & suppress output
fet = FeatureExtractionTool(config="custom_config.json", verbose=0)
2. Extract Features for Video Files
The FeatureExtractionTool.run_single() function extract features from a single video file. It takes in 4 arguments:
in_file(Required): Path to input video file.out_file: Path to output file. If omitted, no output file will be created.text_file: Path to text transcript file. Required when extracting text features.return_type:'pt'for pytorch tensor,'np'for numpy array. Default:'np'.
Example code:
from MSA_FET import FeatureExtractionTool
# Extract visual feature with default openface config from input.mp4
fet = FeatureExtractionTool("openface")
feature = fet.run_single("input.mp4")
print(feature)
# Extract multimodal feature with custom config file and save features to features.pkl
# the parameter 'text_file' is required if text features are to be extracted
fet = FeatureExtractionTool("custom_config.json")
fet.run_single(in_file="input.mp4", out_file="feature.pkl", text_file="input.txt")
3. Extract Features for Datasets
Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a
label.csvfile is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded here.
Note: From version v_0.4.0, the
run_datasetfunction has been rewritten to support multiprocessing. To enable this we have to reconstruct the code thus the function is no longer a class method underFeatureExtractionToolclass. It is a stand alone function which needs to be imported directly. See below examples for reference.
The run_dataset() function extract features from a specificly arranged dataset folder. The function takes in 9 arguments:
config: Python dictionary of config, or path to a JSON file, or name of an example config.dataset_dir: Path to dataset directory. If specified, will override 'dataset_name'.out_file: Output feature file. If not specified, features will be saved under the dataset directory with the name 'feature.pkl'.return_type:'pt'for pytorch tensor,'np'for numpy array. Default:'np'.num_workers: Number of workers for parallel processing. Default:4.padding_value: padding value for sequence padding.'zero'or'norm'. Default:'zero'.padding_location: padding location for sequence padding.'end'or'start'. Default:'end'.face_detection_failure: action to take when face detection fails.'skip'the frame or'pad'with zeros. Default:'skip'.tmp_dir: Directory for temporary files. Default:'~/.MSA-FET/tmp'.log_dir: Log directory. Default:'~/.MSA-FET/log'.log_level: Verbose level of stdout. Default:logging.INFOprogress_q: Reserved for M-SENA platform. Multiprocessing queue used for reporting progress.task_id: Reserved for M-SENA platform. Task ID.
Example Code:
from MSA_FET import run_dataset
# Extract audio features for MOSI using default aligned feature config
run_dataset(
config = 'aligned',
dataset_dir = 'MSA-Datasets/MOSI',
out_file = './feature_aligned.pkl',
num_workers = 8
)