Labelling v3 - RodentDataAnalytics/mwm-ml-gen GitHub Wiki
The Labelling process requires a segmentation to be selected as the default.
Contents
Labelling Overview
As described in the publication of Gehring, Tiago V., et al., the segmentation process generates a large number of segments to be classified, labelling all of them manually becomes intractable. For this reason, in order to classify all the segments, a semi-supervised clustering algorithm is used which requires only a small number of manually labelled segments as an input in order to classify the rest. There is no rule of the thumb on how many labels needs to be provided but as a general rule 10%-12% of the segments need to be labelled and various examples of each strategy needs to be provided.
The Labelling Panel
-
Browse Trajectories
Opens a new window in which the user can visualize the segments in order to label them (for more information refer to Browse Trajectories). -
Default Labels
will list all the different labelling of this project. In case a labelling is not shown press theRefresh
button. The default labels specifies with which labelling the user wants to proceed and its name specifies the number of entered labels and in which segmentation it is applied (for example labels_1301_250_09-19OCT2016.mat means that 1301 were given to the segments of the segmentation with segment length of 250cm and overlap 90%, while the rest of the name followed by the ' - ' is a custom made note). -
Labelling Quality
runs the classification procedure using 10-fold cross validation ten times and generates three graphs which indicates the labelling quality (for more information refer to Labelling Quality).
Browse Trajectories
To activate Browse Trajectories first a segmentation needs to be loaded by pressing the Load Configuration File
button and navigating to the desired segmentation file (see the picture below). By default the navigation window will start from the current project folder and the segmentation files should be inside the folder 'segmentation'.
When the segmentation file loads the window will fill with various information and the labelling procedure can start.
-
Plotter: The plotter visualizes the whole trajectories or their segments. The buttons
<=
and=>
can be used to move to the next or the previous trajectory, alternatively the trajectory ID number can be specified in the field and, if the ID is valid the specified trajectory will appear, by pressing theOK
button. The red • specifies the starting point of the trajectory while the red ○ the ending point. -
Tables: The Trajectory table shows various info of the selected trajectory such as is numeric ID, the animal ID and group in which this trajectory belong to, etc. The Segment table shows various info of the selected segment of the selected trajectory such as its numeric ID, its starting point (offset) and a list of all its features (the complete list of features can be found in the appendix section, List of Features).
-
Labelling: The list contains all the segments of the selected trajectory, by selecting each segment on the list then this segment is visualized on the plotter; by selecting the 'trajectory' then the whole trajectory is visualized. To label a selected segment the desired strategy needs to be selected from the dropbox and then the button
+
needs to be pressed and the label will appear on the gray field (the UD is a special label that marks the segment to be discarded from further analysis). The-
button will remove the selected strategy from the selected trajectory. Multiple labels can be placed in each segment and labels cannot be placed on the whole Trajectories. The buttonExport
exports the current plot to a JPEG-image file inside the results of the project folder. The buttonExport All
exports the plots of all the segments that have the label selected on the dropbox (if 'UD' is selected then all the segments that have this label or multiple labels will be exported). Finally the table of this panel contains information of how many labels (per strategy and total) have been provided. Segments with multiple labels will be considered as 'UD'.
When enough segments are labelled the labels can be saved by pressing the save
button. Upon pressing this button an input dialog box will appear asking if the user wants to place a note on the name of the generating labelling files (allowed characters are 0-9, A-Z, a-z, while spaces and special characters will be ignored). The labels are saved inside the 'labels' folder of the project. The load
button can be used to load previous labels but any existing labels will be lost.
Labelling Quality
In the publication of Gehring, Tiago V., et al. in order to classify the segments the algorithm was needed a pre-defined number of clusters. In order to find a number that would yield optimal results the classification procedure was running a couple of times (8 times, using number of clusters 30 to 100 with increment of 10) with different number of clusters and for each different number of clusters the 10-fold cross validation was used to generate three metrics for this specific classification.
- Classification Error(%): The percentage of classification error.
- Undefined Segments (%): The percentage of segments belonging to clusters that could not be mapped to a single class.
- Trajectory Coverage(%): The percentage of the full swimming paths that are covered by at least one segment of a known class.
Afterwards the number of clusters was chosen based on the classification with the lowest error and undefined segments and the maximum coverage.
In this version the the program this procedure is used as an indication of the labelling quality, meaning that in case of high error and low coverage the user may consider to provide more labels.