Algorithms Pipeline - PARC-projects/video-query-home GitHub Wiki

After reading this description of the pipeline, go to Compute Video Features to compute deep net embedded features for a video and load the features into the API database.

Computing Features

We use Temporal Segment Networks (TSN) code to compute features for video clips.

Environment variables

  • TSN_ROOT - the main TSN directory
  • TSN_ENVIRON - the name of the conda environment for TSN calculations, if a conda environment is being used. If not, the virtual environment for TSN code needs to be set up before running this pipeline. Our TSN conda environment is documented here (conda TSN environment).

build_wof_clips.py

Used to build all the RGB and warped optical flow jpeg files, and sort them into clips in a specified directory structure. The code is written to run on a GPU compute server.

calcSig_wOF_ensemble.sh

Used to compute features for all clips. The features are stored in CSV files within a directory tree that branches first for each video and then for each CNN model used to compute features for the video. This script calls calcSig_wOF.py, which uses Temporal Segment Networks code to compute global_pool features for CNN defined in Caffe. If other options are desired, call calcSig_wOF.py directly, using calls like those in calcSig_wOF_ensemble.sh.

load_db.py

Loads the "features" table in the API database, given CSV files with features from steps 1 and 2.


Computing Matches

Matches are determined from scores based on a weighted average of dot product similarities of embedded feature vectors. Similarities are computed for all specified streams (currently rgb and warped optical flow). For each stream, the similarity is an ensemble average over multiple deep neural networks trained for that stream.

compute_matches.py

For each round, computes matches for a reference video clip, and loads them into the API database. Incorporates the user feedback for all rounds after the initial one. Uses the TargetClip, Ticket, and Hyperparameter classes.

steps:

  • compute similarities: Computes ensemble averaged similarity values for the similarity between a given reference clip and all other clips. A separate ensemble-averaged similarity is computed for each TSN stream (e.g. rgb and warped optical flow).
  • optimize weights: In ensemble scoring possible matches, the similarities for different TSN streams are weighted differently. This function adjusts the weighting based on user feedback. It also adjusts the threshold score for matches vs. non-matches.
  • compute score: Computes scores for candidate matches using the similarities for the match and the scoring weights for each TSN stream.
  • select matches: Randomly selects matches and near matches for user review, keeping the total less than or equal to the maximum number specified by the user in the Video Query UI.
  • finalize: If the user said to "Finalize" in the Existing Query screen, a final report is prepared as a csv file.