APIs - thuiar/MMSA-FET GitHub Wiki

1. Customize Configs

The get_default_config function returns configs in python dictionaries. It takes only one argument which is the name of the extraction method. Supported methods are: 'bert', 'glove', 'librosa', 'mediapipe', 'openface', 'opensmile', 'roberta', 'wav2vec', 'aligned'.

Example code:

from MSA_FET import get_default_config, FeatureExtractionTool

# Get default config for OpenFace & alter
config_v = get_default_config('openface')
# Enable Active Speaker Detection
config_v['video']['multiFace']['enable'] = True 

# Get default config for openSMILE & alter
config_a = get_default_config('opensmile')
# Use LLD features
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'

# Get default config for bert & alter
config_t = get_default_config('bert')
# Switch to Chinese
config_t['text']['pretrained'] = 'bert-base-chinese'

# Combine the three modalities
config = {**config_a, **config_v, **config_t}

# Initialize main class
fet = FeatureExtractionTool(config)

2. Initialize Main Class

The FeatureExtractionTool class is the main class of this toolkit. The initialization function takes in 5 arguments:

config*(Required)*: Python dictionary or path to a JSON file or name of an example config.
dataset_root_dir: Path to datasets parent directory. Used when extracting dataset features with dataset_name.
tmp_dir: Temporary directory path. Default: '~/.MSA-FET/tmp'.
log_dir: Log directory path. Default: '~/.MSA-FET/log'.
verbose: Verbose level of stdout. 0 for error, 1 for info, 2 for debug. Default: 1.

Example code:

from MSA_FET import FeatureExtractionTool

# Initialize with example config & change temp dir
fet = FeatureExtractionTool(config="librosa", tmp_dir="/tmp")

# Initialize with custom_config.json & suppress output
fet = FeatureExtractionTool(config="custom_config.json", verbose=0)

2. Extract Features for Video Files

The FeatureExtractionTool.run_single() function extract features from a single video file. It takes in 4 arguments:

in_file*(Required)*: Path to input video file.
out_file: Path to output file. If omitted, no output file will be created.
text_file: Path to text transcript file. Required when extracting text features.
return_type: 'pt' for pytorch tensor, 'np' for numpy array. Default: 'np'.

Example code:

from MSA_FET import FeatureExtractionTool

# Extract visual feature with default openface config from input.mp4
fet = FeatureExtractionTool("openface")
feature = fet.run_single("input.mp4")
print(feature)

# Extract multimodal feature with custom config file and save features to features.pkl
# the parameter 'text_file' is required if text features are to be extracted
fet = FeatureExtractionTool("custom_config.json")
fet.run_single(in_file="input.mp4", out_file="feature.pkl", text_file="input.txt")

3. Extract Features for Datasets

Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a label.csv file is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded here.

Note: From version v_0.4.0, the run_dataset function has been rewritten to support multiprocessing. To enable this we have to reconstruct the code thus the function is no longer a class method under FeatureExtractionTool class. It is a stand alone function which needs to be imported directly. See below examples for reference.

The run_dataset() function extract features from a specificly arranged dataset folder. The function takes in 9 arguments:

config: Python dictionary of config, or path to a JSON file, or name of an example config.
dataset_dir: Path to dataset directory. If specified, will override 'dataset_name'.
out_file: Output feature file. If not specified, features will be saved under the dataset directory with the name 'feature.pkl'.
return_type: 'pt' for pytorch tensor, 'np' for numpy array. Default: 'np'.
num_workers: Number of workers for parallel processing. Default: 4.
padding_value: padding value for sequence padding. 'zero' or 'norm'. Default: 'zero'.
padding_location: padding location for sequence padding. 'end' or 'start'. Default: 'end'.
face_detection_failure: action to take when face detection fails. 'skip' the frame or 'pad' with zeros. Default: 'skip'.
tmp_dir: Directory for temporary files. Default: '~/.MSA-FET/tmp'.
log_dir: Log directory. Default: '~/.MSA-FET/log'.
log_level: Verbose level of stdout. Default: logging.INFO
progress_q: Reserved for M-SENA platform. Multiprocessing queue used for reporting progress.
task_id: Reserved for M-SENA platform. Task ID.

Example Code:

from MSA_FET import run_dataset

# Extract audio features for MOSI using default aligned feature config
run_dataset(
    config = 'aligned', 
    dataset_dir = 'MSA-Datasets/MOSI', 
    out_file = './feature_aligned.pkl',
    num_workers = 8
)