Classification - RodentDataAnalytics/mwm-ml-gen GitHub Wiki
The classification process requires a default segmentation and a default labelling to be selected.
Contents
Classification Overview
The classification process described in the publication of Gehring, Tiago V., et al. was based on finding an 'optimal' classifier which would successfully classify the segments. A 10-fold cross validation was used on different classifiers with different number of clusters which were tasked to classify the segments and the classification performance was assessed by taking into account three variables: the classification error, the percentage of the trajectories that was classified (coverage) and the percentage of undefined segments. The importance of the last variable was minor, because in case of high coverage and low error the undefined segments were not taking into account.
The new classification is based on classification boosting with majority voting meaning that various classifiers are generated and work together (form an ensemble) in order to complete the classification. The reason for using this approach is because some real-world problems are so complex that single algorithmic classification solutions are unable to achieve high performances. In addition, the particular problem of assigning trajectory segments to behavioural strategies can be subjected and it is also prone to human error because of the manual labelling. Thus multiple classification solutions can be used to create a more robust classification and to provide a degree of confidence on forming conclusions about the dataset under investigation.
Only strong classifiers are considered throughout the classification and their goodness is assessed by the cross-validation error; if the error is equal or more than 25% (default) then the classifier is considered week. Furthermore, apart from forming an ensemble the classification results of the classifiers are used to support the classification outcome of the ensemble.
The Classification Panel
-
Default
If a segmentation and its labels are selected, the default classification automatically run theLabelling Quality
process for number of clusters from the number of different labels the user has provided increment by 2 up to 100 in order to detect and generate strong classifiers. Only classifiers with validation error lower than 25% are generated and are used in order to form an ensemble. Inside the ensemble a simple majority voting takes place meaning that for each segment votes are collected regarding in which class this segment belongs to and the strategy with the most votes wins; in case of equality the segment is marked as unclassified. -
Advanced
The advanced classification allows the user to generate his own specific classifiers, merge a portion of them or create multiple ensembles and also offers the ability of customising the majority voting by specifying a specific threshold. For more information refer to Advanced Classification. -
Default Classification:
will list all the different classifications of this project. In case a classification is not shown press the Refresh button. The selected classification will be used for the results generation and its name specifies the number of entered labels and the segmentation configuration, in case of the standalone classifiers, followed by a number of created ensembles (default is 1) and an indication of the majority voting threshold if specified (default is mr0 meaning no threshold). Example: class_1261_10388_250_07_65_1_mr0 means classification using 1261 labels on the segmentation consisted of 10388 segments with segment length of 250cm and overlap 70%; 65 classifiers were merged to form 1 ensemble and no threshold was used during majority voting.
Advanced Classification
The advanced classification brings forth another window which is split into two parts.
The classifiers panel is used to specify which classification tunings (numbers of clusters Ks) will be used on the specified segmentation using the selected labels.
-
Segmentation and Labels: One segmentation and its equivalent labels needs to be chosen from the two dropboxes.
-
Cluster: Each classifier is described by its number of clusters. The user has the ability to define specific numbers of clusters and for each one a classifier will be created. If two numbers are separated by ' : ', for example 15:17 then 3 classifiers will be created with number of clusters 15, 16, 17.
-
Generate Classifiers
Pressing this button will result in the generation of classifiers depending on the options specified.
The merging panel is used to set up the ensemble procedure. Having generating the classifiers the box on the left side of the window will list the available classifiers pools (each segmentation has its own classifiers).
-
Classifiers per group: Specifies how many classifiers will be used from the pool for the final classification of the segments. In case more classifiers than the ones available are specified an error message will pop-up; in case more classifiers are specified then they will be picked at random from the pool.
-
Iterations: Specifies how many ensembles will be created. Since the classifiers are selected at random in the case that a small sample of them has been specified each ensemble is going to be different.
-
Merging Rule: Currently only one rule is available, the majority voting. By pressing the button
Rule Options
the user can set a threshold for the winning strategy. For example if the strategy Thigmotaxis for a specific segment has 4 votes out of 10 and the Incursion strategy 5 out of 10 the Incursion wins but with a threshold of 80 (meaning 80%) there is no winning strategy thus this segment is marked as unclassified. In case of draw then the segment is again marked as unclassified. The rule options are rest to their defaults if another pool is selected. -
Specified: Lists all the different classifiers of the pool and only the selected ones are taking part in the ensemble(s) formation(s). In case specific classifiers are selected this button will turn red.
-
Merge: Pressing this button executes the merging procedure with the defined settings.
The Refresh
button refreshes the classifiers pool list in case some pools are not appearing. When the classification is finished the user can close this window and return to the main menu.