Labelling v2 - RodentDataAnalytics/mwm-ml-gen GitHub Wiki
The Labelling process requires a segmentation object created by the Segmentation process and most of this process takes place in the secondary graphical user interface (Browse Trajectories GUI) activated by the button Browse Trajectories.
Contents
Labelling Overview
As described in the publication, the segmentation process generates a large number of segments to be classified, labelling all of them manually becomes intractable. For this reason, in order to classify all the segments, a semi-supervised clustering algorithm is used which requires only a small number of manually labelled segments.
The manual labelling is achieved by using a Browse Trajectories GUI which provides a clear visualization of both the trajectories and the segments.
The Classification Panel
- Browse Trajectories (Button): Opens the Browse Trajectories GUI for labelling the segments.
- Labelling Data Path Specifies the path of the labelling data CSV file created by using the Browse Trajectories GUI. The path needs to be provided manually by the user and it is a requirement in order to run the Num of Clusters functionality (see below) as well as the Classification process.
- Num of Clusters (Button): Due to the fact that the program uses a clustering algorithm for the classification process, an ideal number of clusters needs to be defined. In order for such a number to be found the user is provided with a GUI which can be used to indicate the impact of the number of clusters on the clustering performance. This GUI becomes available with the press of this button. For more information about how to use this GUI refer to the section below.
Finding an ideal Number of Clusters
The main purpose of this GUI is to run the clustering process a couple of times with different number of clusters and propose an ideal number of clusters for the classification process. The amount of times that the clustering process will run is defined by the Min number of Clusters, Max number of Clusters and Step. These three parameters are translated as: Run the clustering process for Min number of Clusters and repeat by increasing the number of clusters by Step until you reach Max number of Clusters. The user should note that the Step can be increased/decreased by 5 and the Min number of Clusters and Max number of Clusters can be increased/decreased by Step. Also the minimum number of clusters cannot surpass the maximum number of clusters.
After the clustering process has been executed the number of times defined, the table on the left side of the GUI will be filled with proposed ideal numbers of clusters. Each row of the table will contain:
- Proposed: The proposed number of clusters.
- Errors(%): The percentage of classification errors.
- Undefined(%): The percentage of segments belonging to clusters that could not be mapped to a single class.
- Coverage(%): The percentage of the full swimming paths that are covered by at least one segment of a known class.
- Check: A checkbox which is used to select the preferred row.
The program will attempt to list only the numbers of clusters which manage to bring satisfactory results (low percentages of Errors and Undefined and high percentage of Coverage). In case such numbers cannot be found the table will display the results from all the clustering processes. In case of the latter it is highly possible that the partial labelling is poorly and that more (or more accurate) labels are needed.
The rest of the GUI:
-
Run: Executes the clustering process as many time as defined by Min number of Clusters, Step and Max number of Clusters.
-
Show Graphs: Generates a series of graphs showing (a) the percentage of classification errors; (b) the percentage of segments belonging to clusters that could not be mapped to a single class (continuous lines: two-stage clustering, dashed lines: single stage clustering); (c) the percentage of the full swimming paths that are covered by at least one segment of a known class. Through these graphs the user can determine the ideal number of clusters that needs to be used for the Classification process.
-
OK: Activated only if a checkbox form the table is checked. Closes the GUI and returns the selected Number of Clusters to the Main GUI.
-
Cancel: Closes the GUI and returns the user to the Main GUI.
Note: The repeated execution of the clustering process can take a lot of time depending of the number of iterations specified by the Min number of Clusters, Max number of Clusters and Step. The user is advised to first run this procedure by using the default numbers (Min = 10, Max = 100, Step = 10) and adjust his settings according to the outcome (the results of the table and also the indications of the graphs that can be generated using the Show Graphs button). If both the table and the graphs indicate poorly results then he may consider to add or adjust the labelling data.