Second Task - Konnsy/REAML2022-hackathon GitHub Wiki

Although a binary classification is a good start, determining concentrations offers advantages.

Challenge

This task is about much more than just classifying an image. However, the experience you gained in the first task will help you with this one. You receive datasets recorded from samples that were either containing no particles, particles of one physical size or mixtures of multiple physical sizes. Therefore, your system should be able to detect particles with different visibilities.

Below you can see a possible detection pipeline which contains different steps which can be followed to estimate the sizes of contained particles: Task2_pipeline_image

After the temporal preprocessing, a segmentation and detection of the segmented blobs take place. The single blob positions are stored for each frame individually and connected with those from previous or following frames if there is a spatial overlap between them. The structure, which holds the positional and temporal information on overlapping blobs, is from now on called a trace. The trace filter then sorts out traces that do not fulfil certain criteria, e.g., blobs that appear only for a very short time or do not contain the characteristic shape of particle time series. Some traces can therefore be removed when they are discovered to be falsely classified.

Feel free to deviate from this pipeline, where it makes sense to you. You may also create a completely different approach with a more accurate or faster detection system. Just make sure to be able to predict a number of particles from a given path containing a dataset.

Syntax
Here you are free in the way you use the data. You can preprocess as in Task 1 and then continue in two dimensions until you include the raw data again to determine the particle counts. Alternatively, you can view the input directly in three-dimensional blocks of any size or, without using blocks, feed the raw images one by one into a recurrent network. You may also change window sizes if you want to.

Provided Code
In task2_code_frame you can find a code frame to start with. Classification and blob detection are not implemented. Develop ideas on how to implement them. Make sure that test analysis at the end of the code works so that you can produce results for your submission when we give you previously unknown test data sets.

Restrictions
Make sure to use no more than 6GB VRAM and 8GB RAM while testing (you may use more while training).

Efficient use of resources
The results of your system are the most important aspect, but also consider the resource consumption of your approach. Task 3 will be built on this task and requires a system that is executable on the Odroid N2+, an embedded device with limited resources. So, it makes sense to use resources efficiently already now, and keeping the next task in mind during development.

Where to find the available datasets
https://tu-dortmund.sciebo.de/s/rTpNvzQBefsC1XS
Please note that the annotations were made manually on preprocessed images that were calculated with a block size of 100 raw images. If you use other sizes, the number of frames on which particles are visible will change. The particles dataset class in the frame code will do this job for you.

If you want to use the preprocessed files directly (instead of the given preprocessing on the fly or just for a better visualization) you can download them at https://tu-dortmund.sciebo.de/s/TEBj972tO1TnJ6D. But be aware that the window size of 100 must also be matched in the particles dataset by giving the window size of 100 as input (standard is 60)!

Scoring
Your system will be evaluated on previously unknown datasets and has to predict the number of particles that can found starting from the raw data. With a known number of particles and your predicted number, we calculate the difference
1.0 - abs(count_predicted - count_known) / max(count_predicted, count_known)
between these values to score your approach.