Mockingjay Preprocessing Instructions - andi611/Mockingjay-Speech-Representation Wiki

Original URL: https://github.com/andi611/Mockingjay-Speech-Representation/wiki/Mockingjay-Preprocessing-Instructions

LibriSpeech Preprocessing for Mockingjay

Preprocessing scripts may be executed directly if the LibriSpeech dataset is placed under data/. The extracted data, which is ready for training, will be stored under the same data/ directory by default.

# Defualt
python3 preprocess.py --feature_type=mel
# To train on different input / output target features, options are:
python3 preprocess.py --feature_type=linear
python3 preprocess.py --feature_type=mfcc
python3 preprocess.py --feature_type=fbank

The datasets that needs to be processed are: 0, 1, and 5:

0 : train-clean-100
1 : train-clean-360
2 : train-other-500
3 : dev-clean
4 : dev-other
5 : test-clean
6 : test-other

Enter the index once the preprocessing script is executed: 0 1 5.

Run preprocessing with the following command to change input directory:

python3 preprocess.py --data_path=<path to LibriSpeech on your computer> 

Warning: If you encounter a subprocess problem like this, install sentencepiece to resolve the issue. You may check the parameter type and default value by using the option --help for each script.

Downstream Task Preprocessing - Phone Classification

Download the phone alignment dataset for LibriSpeech, generated using the Montreal Forced Aligner. The downloaded libri_alignment.zip should be placed under the data/ directory and unzipped:

cd data
unzip libri_alignment.zip

Use the following command to run LibriSpeech phone alignment preprocessing:

python3 preprocess_alignment.py

To use the CPC phone alignment data, use the following command:

cd data/cpc_phone
unzip converted_aligned_phones.zip

Warning: Note that you need to use Kaldi extracted features for these phone alignments, which corresponds to a feature/label for every 10ms.

Downstream Task Preprocessing - Sentiment Classification

The AUDIO FILES can be downloaded from here. After unzipping, the target audio files will be located at Raw/Audio/Full/WAV_16000. Use the following command to run preprocess:

python3 preprocess_mosei/segment_mosei.py --data_path=<MOSEI_ROOT_DIR>/Raw/Audio/Full/WAV_16000
python3 preprocess_mosei/extract_mosei.py
python3 preprocess_mosei/length_mosei.py

TIMIT Preprocessing for Mockingjay

Preprocessing scripts may be executed directly if the TIMIT dataset is placed under data/. The extracted data, which is ready for training, will be stored under the same data/ directory by default.

python3 preprocess_timit.py --feature_type=mel

Run preprocessing with the following command to change input directory:

python3 preprocess_timit.py --data_path=<path to TIMIT on your computer>