Labelling - RodentDataAnalytics/mwm-ml-gen GitHub Wiki

The Labelling process requires a segmentation to be selected as the default.

Contents

  1. Labelling Overview
  2. The Labelling Panel
  3. Browse Trajectories
  4. Labelling Quality

Labelling Overview

As described in the publication of Gehring, Tiago V., et al., the segmentation process generates a large number of segments to be classified, labelling all of them manually becomes intractable. For this reason, a semi-supervised clustering algorithm is used which requires only a small number of manually labelled segments as an input in order to classify the rest. Generally the algorithm works as follows: the computed features are used to find patterns in the data and create clusters; afterwards the labels are used to merge the clusters together and form the behavioural classes. As a rule of the thumb 10%-12% of the segments needs to be manually labelled and various examples of each strategy needs to be provided.

The Labelling Panel

labelling overview

  • Browse Trajectories Opens a new window in which the user can visualize the segments in order to label them (for more information refer to Browse Trajectories).

  • Default Labels will list all the different labelling of this project. In case a labelling is not shown press the Refresh button. The default labels specifies with which labelling the user wants to proceed and its name specifies the number of entered labels and in which segmentation it is applied (for example labels_1261_250_07-19OCT2016.mat means that 1261 were given to the segments of the segmentation with segment length of 250cm and overlap 70%, while the rest of the name followed by the ' - ' is a custom made note).

  • Labelling Quality tests different number of clusters in order to detect `strong' classifiers (classification tunings). This is done by using the 10-fold cross validation procedure were a subset of labels are used for testing and the rest from training. This step is optional and if it is skipped it is automatically executed for the default classification procedure. Finally this process generates various statistics and graphs, for more information refer to Labelling Quality.

Browse Trajectories

This GUI is used as a visualisation tool for the manual labelling of the trajectories or the segments. To activate it first a segmentation needs to be loaded by pressing the Load Configuration File button and navigating to the desired segmentation file (see the picture below). By default the navigation window will start from the current project folder and the segmentation files should be inside the folder 'segmentation'. Alternative, the my_trajectories.mat (inside the settings folder) can be loaded in order to provide labels to the whole swimming paths of the animals.

browse deactivated

When the segmentation file loads the window will fill with various information and the labelling procedure can start.

browse activated

  1. Plotter: The plotter visualizes the whole trajectories or their segments. The buttons <= and => can be used to move to the next or the previous trajectory, alternatively the trajectory ID number can be specified in the field and, if the ID is valid the specified trajectory will appear, by pressing the OK button. The red dot (•) specifies the starting point of the trajectory while the red cross (×) the ending point.

  2. Tables: The Trajectory table shows various info of the selected trajectory such as is numeric ID, the animal ID and group in which this trajectory belong to, etc. The Segment table shows various info of the selected segment of the selected trajectory such as its numeric ID, its starting point (offset) and a list of all its features (the complete list of features can be found in the appendix section, List of Features).

  3. Labelling: The list contains all the segments of the selected trajectory, by selecting each segment on the list then this segment is visualised on the plotter; by selecting the 'trajectory' then the whole trajectory is visualised. To label a selected segment the desired strategy needs to be selected from the dropbox and then the button + needs to be pressed and the label will appear on the gray field (the UD is a special label that marks the segment to be discarded from further analysis). The - button will remove the selected strategy from the selected trajectory. Multiple labels can be placed in each segment and labels can be placed on the whole Trajectories only if the my_trajectories.mat file has been loaded. The button Export exports the current plot to an image file inside the results subfolder of the project folder. The button Export All exports the plots of all the segments that have the label selected on the dropbox (if 'MULTI' is selected then all the segments that multiple labels will be exported). Finally the table of this panel contains information of how many labels (per strategy and total) have been provided. 'MULTI' is a special case tag and cannot be used as a label.

  4. Export...: Extra exporting options:

    • Export Undefined: Exports all the unlabelled segments.
    • Export Everything: Exports all the labelled segments into separate folders.

When enough segments are labelled the labels can be saved by pressing the save button. Upon pressing this button an input dialog box will appear asking if the user wants to place a note on the name of the generating labelling files (allowed characters are 0-9, A-Z, a-z, while spaces and special characters will be ignored). The labels are saved inside the 'labels' folder of the project. The load button can be used to load previous labels but any existing labels will be lost.

browse activated

Labelling Quality

Classification quality can be assessed by a number of factors, including the cross validation error, the number of unclassified segments, etc. As default the cross validation error is being considered to select strong classifiers.

This button generate four different statistics based on different number of clusters K (default values of K that are tested start from the number of classes that were used increment by 2 up to 100). These statistics along with figures are generated inside the results subfolder of the project and include:

  1. Error(%): The percentage of classification error based on cross validation excluding the unclassified segments.
  2. Undefined(%): The percentage of segments belonging to clusters that could not be mapped to a single class.
  3. Coverage(%): The percentage of the full swimming paths that are covered by at least one segment of a known class.
  4. Validation Error(%): The true cross validation error including the unclassified segments in the error calculation. This is the default factor of assessing the quality of the classification.

quality metrics

The user may select number of clusters K based on these statistics in case of manual classification tuning. By default the program uses this process to find strong classifiers assessing them by the percentage of Validation Error statistic.