Classifying experimental patterns - duaneloh/Dragonfly GitHub Wiki

Pattern Classification

An important part of analyzing experimental single particle diffraction data is to identify and reject patterns which were from multi-particle clusters, water droplets and other extraneous patterns which should not be oriented and averaged. This identification can be done using low-resolution data which has high-enough signal to lend itself to a pattern-by-pattern analysis. classifier.py is an experimental GUI which uses machine learning techniques combined with user interaction to identify the single particle patterns from a data set.

Here is a demonstration of the GUI applied to the same data set as in the experimental data quick start, Scientific Data 4, 170079 (2017).

Setup

The Classifier GUI has its own section in the configuration file:

[classifier]
in_photons_list = amo86615_PR772_all.txt
in_detector_file = data/det_pnccd_back.dat
output_folder = data/

The detector file is the same as in the quick start.

GUI layout

The classifier is launched using the command

$ ./classifier.py -M

where the flag states that the mask must be applied to the data. You enter into the "Display" panel with the display area on the left showing the same thing as frameviewer.py.

https://github.com/duaneloh/Dragonfly/blob/master/images/classifier_display.png

Manual Classification

The "Manual" panel can be used to classify each pattern into one of the alphabet characters [a-z]. To do this, check the "Classify" box and enter the relevant key. The program will apply that class to the pattern and switch to the next pattern automatically. Thus, one can quickly classify many tens of patterns into one of 26 unique classes.

The screenshot below shows a "b" (Bad) pattern. Using the radio buttons below the "Classification Summary", one can browse through the patterns of a single class and also view the virtual powder pattern within the class. The number to the right specifies how many threads to use to calculate the class powder.

https://github.com/duaneloh/Dragonfly/blob/master/images/classifier_manual.png

Conversion (Basis change)

The right choice of basis can greatly enhance the effectiveness of any classification algorithm. One can choose various methods, shown below in the drop-down menu. The one chosen 'ang_corr_normed' is calculated by first calculating the polar representation with the binning parameters specified at the top of the panel. This conversion is shown on the right in the display area. The angular correlations are calculated using the Fourier transform intensities of each row of the polar representation. They are then normalized to eliminate the effect of incident fluence.

https://github.com/duaneloh/Dragonfly/blob/master/images/classifier_conversion.png

One can also batch-process many patterns using multiple threads with the "Process" button. This converted data is used for the classification described below.

Manifold Embedding

One can perform manifold embedding with the converted patterns generated above and see if there are any obvious clusters of bad patterns. A few different embedding methods have been provided with fairly default-like parameters provided by scikit-learn. Below, you can see the result of "Spectral Embedding" on 1000 patterns. Three polygonal regions of interests (ROIs) have been drawn using the "Draw ROI" button, while a fourth has been constructed, but not completed. One can browse through the patterns within each ROI and assign a class label to all the patterns within the class. One can also look at where manually classified patterns of a given class are embedded.

https://github.com/duaneloh/Dragonfly/blob/master/images/classifier_embedding.png

Neural Network (Multi-layer Perceptron)

While manifold embedding can be quite effective in identifying bad patterns, it does not scale very well with increasing number of patterns. A trained neural network can classify patterns quickly and in parallel. Manual classification and manifold embedding can be used to classify a few hundred patterns, which can then be used to train the perceptron. The screenshot below shows the classifier acting on the whole dataset of 44,039 patterns to classify 4942 cosmic ray events, 18006 multiple hits, and 21091 single hits in around 1 minute on a 32-core machine.

https://github.com/duaneloh/Dragonfly/blob/master/images/classifier_mlp.png

Extending the classifier

This python GUI has been written in a modular way to allow you to add other classification or conversion algorithms to improve performance. We would love to include any extensions you would like to add. Contact Kartik Ayyer (kartik.ayyer [at] desy.de) if you need help implementing your ideas, or if you need a copy of the unclassified data in order to follow along this tutorial.