Metrics - UTD-CRSS/exploreapollo-import-audio GitHub Wiki

Metrics data (in json format - described in Data section below) can be either uploaded from a local machine, or if the data is already in S3, transferred to the database from S3. These scripts first require the config file be filled out with the correct S3 and API server credentials.

The api server will automatically not accept duplicate metric items. So say you upload all metric .json files from S3 one day, and then at a later date add 3 more, then run the entire script again. The script will attempt to upload even those metric files you have already uploaded, and you will see a lot of "ERROR - Metric.." prints. You can ignore these. It is just the api rejecting the ones that have already been uploaded

Data Requirements

The scripts require that the filenames of each file be in the format <mission>_<recorder>_<channel>_<start MET>. This format is necessary to obtain all the information needed in the database. For example, A11_HR1U_CH10_00000175000.json is a valid metrics json name.

As far as the content of the json file, lists of start_time and end_time values are required. If these are not supplied, the scripts will catch this error and use to file to upload metrics. The start_time and end_time values in each position are in seconds (need to convert to ms for DB), relative to the met_start time on the given file. The start_time and end_time in position 0 in the list correspond to the metrics in position 0 in the other lists. Example (say this file is A11_HR1U_CH10_00000175000.json):
{
"start_time": ["0", "65.35", "130.619", "207.392", "284.933", "358.552"],
"end_time": ["65.35", "130.619", "207.392", "284.933", "358.552", "398.185"],
"word_count": ["10", "1", "13", "9", "12", "1"],
"speakers": ["SPK2", "SPK1", "SPK1", "SPK3"]
}

To analyze this lets first take position 0 in all the lists. This means that from time (175000 + (0 * 1000)) to (175000 + (65.35 * 1000)), the word_count to be represented on the graphs is 10. This is an individual entry in the database.

The entries for "speakers" and "interaction_matrix" do not behave this way, however. These values are all uploaded together as a list instead of individual entries into the database, using the met_start of the file as the start time and the maximum "end_time" value as the end time. These are used to populate the chord diagram for a given interval.

Running

When uploading from a local machine, from the directory src/ run

python MetricsUpload.py <local base folder> <S3 base folder>

where is the base folder containing all the files to be uploaded, and is the folder under which the items will be stored in S3. For example, under src/ is a subfolder named A11_HR1U_CH10_AIR2GND. Running

python MetricsUpload.py A11_HR1U_CH10_AIR2GND test

uploads all the .json files under src/A11_HR1U_CH10_AIR2GND to S3 under the folder test. The file A11_HR1U_CH10_AIR2GND/A11_HR1U_CH10_00000175000.json will be located in test/A11_HR1U_CH10_00000175000.json. Note that any files in subfolders will also be copied - so a file A11_HR1U_CH10_AIR2GND/subfolder/A11_HR1U_CH10_0000000000.json would end up in test/subfolder/A11_HR1U_CH10_0000000000.json in S3. After the files are uploaded to S3 from the local machine, they will be uploaded to the database via the API entry point specified in config.py.

When moving files only from S3 to the database, from the directory src/ run

python TransferS3Metrics.py 

This will transfer ALL the files in S3. This will take a long time! When only a subset of the data in S3 is needed, you can specify the channel ID and a range of METs to transfer.

python TransferS3Metrics.py <channel> <met start> <met end>

Note that since the script only looks at the MET start given in the filenames to determine if it should be imported, some data may be missed at the beginning from files beginning at an earlier MET start, but ending after the given MET start. To avoid this, run the script with a lower MET start than needed.

⚠️ **GitHub.com Fallback** ⚠️