Labelling v3 - RodentDataAnalytics/mwm-ml-gen GitHub Wiki

The Labelling process requires a segmentation to be selected as the default.

Contents

  1. Labelling Overview
  2. The Labelling Panel
  3. Browse Trajectories
  4. Labelling Quality

Labelling Overview

As described in the publication of Gehring, Tiago V., et al., the segmentation process generates a large number of segments to be classified, labelling all of them manually becomes intractable. For this reason, in order to classify all the segments, a semi-supervised clustering algorithm is used which requires only a small number of manually labelled segments as an input in order to classify the rest. There is no rule of the thumb on how many labels needs to be provided but as a general rule 10%-12% of the segments need to be labelled and various examples of each strategy needs to be provided.

The Labelling Panel

labelling overview

  • Browse Trajectories Opens a new window in which the user can visualize the segments in order to label them (for more information refer to Browse Trajectories).

  • Default Labels will list all the different labelling of this project. In case a labelling is not shown press the Refresh button. The default labels specifies with which labelling the user wants to proceed and its name specifies the number of entered labels and in which segmentation it is applied (for example labels_1301_250_09-19OCT2016.mat means that 1301 were given to the segments of the segmentation with segment length of 250cm and overlap 90%, while the rest of the name followed by the ' - ' is a custom made note).

  • Labelling Quality runs the classification procedure using 10-fold cross validation ten times and generates three graphs which indicates the labelling quality (for more information refer to Labelling Quality).

Browse Trajectories

To activate Browse Trajectories first a segmentation needs to be loaded by pressing the Load Configuration File button and navigating to the desired segmentation file (see the picture below). By default the navigation window will start from the current project folder and the segmentation files should be inside the folder 'segmentation'.

browse deactivated

When the segmentation file loads the window will fill with various information and the labelling procedure can start.

browse activated

  1. Plotter: The plotter visualizes the whole trajectories or their segments. The buttons <= and => can be used to move to the next or the previous trajectory, alternatively the trajectory ID number can be specified in the field and, if the ID is valid the specified trajectory will appear, by pressing the OK button. The red • specifies the starting point of the trajectory while the red ○ the ending point.

  2. Tables: The Trajectory table shows various info of the selected trajectory such as is numeric ID, the animal ID and group in which this trajectory belong to, etc. The Segment table shows various info of the selected segment of the selected trajectory such as its numeric ID, its starting point (offset) and a list of all its features (the complete list of features can be found in the appendix section, List of Features).

  3. Labelling: The list contains all the segments of the selected trajectory, by selecting each segment on the list then this segment is visualized on the plotter; by selecting the 'trajectory' then the whole trajectory is visualized. To label a selected segment the desired strategy needs to be selected from the dropbox and then the button + needs to be pressed and the label will appear on the gray field (the UD is a special label that marks the segment to be discarded from further analysis). The - button will remove the selected strategy from the selected trajectory. Multiple labels can be placed in each segment and labels cannot be placed on the whole Trajectories. The button Export exports the current plot to a JPEG-image file inside the results of the project folder. The button Export All exports the plots of all the segments that have the label selected on the dropbox (if 'UD' is selected then all the segments that have this label or multiple labels will be exported). Finally the table of this panel contains information of how many labels (per strategy and total) have been provided. Segments with multiple labels will be considered as 'UD'.

When enough segments are labelled the labels can be saved by pressing the save button. Upon pressing this button an input dialog box will appear asking if the user wants to place a note on the name of the generating labelling files (allowed characters are 0-9, A-Z, a-z, while spaces and special characters will be ignored). The labels are saved inside the 'labels' folder of the project. The load button can be used to load previous labels but any existing labels will be lost.

browse activated

Labelling Quality

In the publication of Gehring, Tiago V., et al. in order to classify the segments the algorithm was needed a pre-defined number of clusters. In order to find a number that would yield optimal results the classification procedure was running a couple of times (8 times, using number of clusters 30 to 100 with increment of 10) with different number of clusters and for each different number of clusters the 10-fold cross validation was used to generate three metrics for this specific classification.

  1. Classification Error(%): The percentage of classification error.
  2. Undefined Segments (%): The percentage of segments belonging to clusters that could not be mapped to a single class.
  3. Trajectory Coverage(%): The percentage of the full swimming paths that are covered by at least one segment of a known class.

quality metrics

Afterwards the number of clusters was chosen based on the classification with the lowest error and undefined segments and the maximum coverage.

In this version the the program this procedure is used as an indication of the labelling quality, meaning that in case of high error and low coverage the user may consider to provide more labels.