Dataset - simula/biomedia-2020 GitHub Wiki

Development Dataset (Released)

The development dataset for the task is the VISEM dataset containing data from 85 male participants aged 18 years or older. For each participant, we include a set of measurements from a standard semen analysis, a video of live spermatozoa, a sperm fatty acid profile, the fatty acid composition of serum phospholipids, study participants-related data and WHO analysis data. The dataset contains over 35 gigabytes of videos, with each video lasting between two to seven minutes. Each video has a resolution of 640 x 480 and runs at 50 frames-per-second. The dataset contains in total six CSV-files (five for data and one which maps video IDs to study participants IDs), a description file, and a folder containing the videos. The name of each video file contains the videos ID, the date it was recorded, and a small optional description. Then, the end of the filename contains the code of the person who assessed the video. Furthermore, VISEM contains five CSV-files for each of the other data provided, a CSV-file with the IDs linked to each their video, and a text file containing descriptions of some of the columns of the CSV-files. One row in each CSV-file represents a participant. The provided CSV-files are:

  • semen_analysis_data.csv: The results of standard semen analysis.
  • fatty_acids_spermatozoa.csv: The levels of several fatty acids in the spermatozoa of the participants.
  • fatty_acids_serum.csv: The serum levels of the fatty acids of the phospholipids (measured from the blood of the participant).
  • sex_hormones.csv: The serum levels of sex hormones measured in the blood of the participants.
  • study_participant_related_data.csv: General information about the participants such as age, abstinence time and Body Mass Index (BMI).
  • videos.csv: Overview of which video-file belongs to what participants.

The following files are provided in the Development dataset:

  • Development dataset archive contains sets of videos and associated meta-data for 85 different participants.
  • Development dataset features archive contains the extracted visual feature descriptors for all videos (from the first 2 frames of every second).

Test Dataset (Released)

The following files are provided in the Test dataset:

  • Test dataset archive contains sets of videos and associated meta-data for 85 different participants split into three folds for three-fold cross-validation.
  • Test dataset features archive contains the extracted visual feature descriptors for all videos (from the first 2 frames of every second) split into three folds for three-fold cross-validation.