First Task - Konnsy/REAML2022-hackathon GitHub Wiki

An important goal of the first task of the hackathon is to get acquainted with the data and the difficulties associated with it. You receive a code frame that is missing a neural network to perform a binary classification, while the training and evaluation parts are already given. While looking at the preprocessing, we encourage you to try to improve this part or even implement a completely different idea for the analysis. Creative solutions can pay off, but also require a good understanding of the data.

Challenge
In this task of the hackathon we provide positive and negative datasets containing raw images, i.e., the recordings that were not preprocessed. Your task is to determine for each example if particles of interest can be recognized in it. Besides the advantage of being able to classify samples with a direct classification of infected/non-infected samples, the first task should help you become familiar with the data and thereby also lay a foundation for the second task.

Since you cannot spot the particles on a raw image, you have to use a temporal preprocessing before a 2D classification, or consider a spatiotemporal (3D) input in your model. You may use the approach from the presentation or come up with your own unique solution.

Make the distinction between samples with particles and samples without particles as reliable as possible. To score the submitted models we will calculate the share of correctly classified images in new datasets, where the background characteristics may differ from those in the training data. Feel free to augment the training data and to experiment with your training code since only your final classification method will be scored.

Task1 example pipeline

Syntax
Your system has to take a tensor of size (b, w, x, y) for batch size b, block size w and image dimensions x and y. With this input it has to output a tensor of size (b, 1), where the values have to be between 0 and 1 to represent a confidence level that a particle is contained (higher confidence means higher trust in the presence of at least one particle).

Provided Code
In task1_code_frame you can find a code frame to start with. The only operation that is not implemented is the classification method. Try to define a good network and fitting meta parameters. Data augmentation could also be helpful. The dataset in raw_dataset.py must remain the same in order to preserve comparability (including the given window size of 60 frames). Keep the test method at the end unchanged so that you can evaluate new, previously unknown datasets and submit your results.

Restrictions
Please make sure to use no more than 6GB VRAM and 8GB RAM (you may use more while training).

How to obtain the datasets
In https://tu-dortmund.sciebo.de/s/Fr899VLesLbj6Ao you can find positive and negative example images for the binary classification task.

If you want to use the preprocessed files directly (instead of the given preprocessing on the fly or just for a better visualization) you can download them at https://tu-dortmund.sciebo.de/s/qmbXSIiO45Gw8jj.

Scoring All submissions will be evaluated on previously unknown datasets. To score the predictions we will calculate the accuracy (tp / (tp + fn) + tn / (tn + fp)) * 0.5 that gives the share of correct positive and negative examples classified correctly (normalized by the total counts). The total accuracy is calculated as the mean over all dataset accuracies.