Classification v3 - RodentDataAnalytics/mwm-ml-gen GitHub Wiki

The classification process requires a default segmentation and a defaults labels to be selected.

Contents

  1. Classification Overview
  2. The Classification Panel
  3. Advanced Classification
  4. Similarity Check

Classification Overview

The classification process described in the publication of Gehring, Tiago V., et al. was based on finding an 'optimal' classifier which would successfully classify the segments. A 10-fold cross validation was used on different classifiers with different number of clusters which were tasked to classify the segments and the classification performance was assessed by taking into account three variables: the classification error, the percentage of the trajectories that was classified (coverage) and the percentage of undefined segments. The importance of the last variable was minor, because in case of high coverage and low error the undefined segments were not taking into account.

The new classification is based on the automatic generation of various classifiers without assessing their quality. Afterwards for each segment a simple majority rule is performed in order to determine in which strategy it belong to. In this way the parameter of defining a number of clusters is dropped and the classification procedure becomes simpler. Moreover, the results are more robust since by using the previous method in many cases different 'good' classifiers may yield different results.

The Classification Panel

classification overview

  • Default Labels If a segmentation and its labels are selected, the default classification automatically generates 30 classifiers and 'merge' their results by using simple majority voting. For each segments votes for which strategy it belongs to are collected and the strategy with the most votes is assigned to the segment. This process is repeating 10 times by randomly selecting classifiers from the classifier pool and merge them with majority voting.

  • Advanced The advanced classification allows the user to generate his own specific classifiers, merge a portion of them and also varying the majority voting by specifying a specific threshold. For more information refer to Advanced Classification

  • Similarity Check Similarity check shows the similarity between two classifiers or classifications by showing the number of segments for each strategy for both of them. For more information refer to Similarity Check

Advanced Classification

The advanced classification brings forth another window which is split into two parts.

The classifiers panel is used to specify for which segmentation and its labels which and how many classifiers will be created.

generate classifiers

  • Segmentation and Labels: One segmentation and its equivalent labels needs to be chosen from the two dropboxes.

  • Cluster: Each classifier is described by its number of clusters. The user has the ability to define specific numbers of clusters and for each one a classifier will be created. If two numbers are separated by ' : ', for example 15:17 then 3 classifiers will be created with number of clusters 15, 16, 17. Leaving this field empty will result in the generation of 30 random classifiers.

  • Generate Classifiers Pressing this button will result in the generation of classifiers depending on the options specified.

The merging panel is used to set up the merging procedure. Having generating the classifiers the box on the left side of the window will list the available classifiers pools (each segmentation has its own classifiers).

merge classifiers

  • Classifiers per group: Specifies how many classifiers will be used from the pool for the final classification of the segments.

  • Iterations: The classifiers used are selected randomly thus the final classification process can run multiple times and each time different sample of classifiers will be selected from the selected pool. The number of times for the final classification process to run is defined in this field.

  • Merging Rule: Currently only one rule is available, the majority voting. By pressing the button Rule Options the user can set a threshold for the winning strategy. For example if the strategy Thigmotaxis for a specific segment has 4 votes out of 10 and the Incursion strategy 5 out of 10 the Incursion wins but with a threshold of 80 (meaning 80%) there is no winning strategy thus this segment is marked as undefined. In case of draw then the segment is again marked as undefined.

Merge Pressing this button executes the merging procedure with the defined settings.

As an advanced merging technique there is also the ability to select multiple classifiers pools from the list. In this case equal number of classifiers will be taken from each pool and the majority rule will be applied only to the matched segments (since each classifier pool applies to a different segmentation given that both segmentations have the same length but different overlap some segments will be the same in both segmentations).

When the classification is finished the user can close this window and return to the main menu.

Similarity Check

Similarity check can be useful for the measuring of the classification results from two different classifiers pools. There is the option of performing similarity between 'files' which applies to two classifiers or two merged classifiers (the product of performing majority voting) or 'folders' which applies to two classifiers pools or two merged classifiers pools (the product of performing majority voting more than one times).

similarity check

Upon pressing Refresh button the For each classifier or classification the table will show how many segments for each strategy have been detected and their difference. In case of huge difference between two merged classifications, the inability of a specific classifiers pool to distinguish between certain classes (usually because not enough labels have been provided) may be detected, which in turn may cause errors on the consistency of the results.