DMD gaze related action annotation criteria - Vicomtech/DMD-Driver-Monitoring-Dataset GitHub Wiki
The DMD dataset contains events of different nature: distraction, drowsiness, hands and gaze. On this section, we present the criteria with which the currently available annotations of DMD were annotated. This only includes Temporal Gaze-related Annotations.
The DMD dataset is composed of synchronized video streams from 3 different cameras. Each camera was placed to capture the activity of certain regions of the vehicle's cabin. In particular, they focus on parts of the driver. Namely, there is a stream which captures the body activity, one for the face and head and other to capture the hand's activity. Therefore we name these streams as body, face and hands camera, respectively.
To annotate the recording sessions we created a mosaic video which synchronously merges the body, face and hands camera streams. This mosaic video should be passed to the temporal annotation tool (TaTo) to start annotating the sequence or correct a previously annotated session.
The defined levels describe temporal actions or events which occur when the driver is performing some gaze-related actions. To annotate temporal actions, we defined 3 levels of annotations. Basically, there are 3 types of annotations that can simultaneously be present and describe one frame. Each level of annotation has its own set of labels. Within each level, the labels are mutually exclusive, meaning that, for each level a maximum of one label is allowed.
The gaze-related annotation levels are:
Depending on the annotation level, some require to have a label for each frame in the video, this is represented with a full cell in the above table (Level 1), while the annotations that can have intervals with the absence of labels are represented with a shorter filled cell (Level 0 and 2).
The following sections describe the criteria to be taken when annotating gaze-related actions in the DMD dataset.
An occlusion is an event that happens when above 50%-60% of the camera view is covered by the driver's own body or any other object and the scene is not recognizable. Since the dataset contains streams from 3 different cameras and each camera focus on specific parts of the driver (i.e. face, body and hands), special attention should be given to the relevant (objective) part of the driver. This means, for instance, if in the hands video the hands and wheel can not be recognized, then there is an occlusion.
To annotate this level, all three streams (face camera, body camera and hands camera) should be considered equally to assign the corresponding labels.
If there is a frame where there is an occlusion in one of the cameras, you should label the frame with one of the following labels:
Key | Label | Description | ||
---|---|---|---|---|
0 | Face occlusion | Stream from face camera is occluded and cannot recognize the action the driver is performing | ||
1 | Body occlusion | Stream from body camera is occluded and cannot recognize the action the driver is performing | ||
2 | Hands occlusion | Stream from hands camera is occluded and cannot recognize the action the driver is performing | ||
Examples | ||||
|
✔️ It is possible there is some ambiguity when defining if there is an occlusion. Especially in the hands camera, since some actions such as talking to the phone, hair and makeup could occlude part of the scene. However, if is it is possible to certainly recognize the driver actions then it should not be considered as an occlusion.
✔️ In this level, only one camera can be annotated as occluded. We have seen there is not any case in which there is a simultaneous occlusion in two or three video streams.
In this level, it is required to identify the gaze zone at which the driver is looking. In every video, the driver is looking for several seconds at a certain predefined gaze region in the car. The order of the gazing regions is the same for every video. Small blinks during the gazing can have the same annotation as the other frames of the same region. At a transition between regions, try to annotate the current gazing region as much as possible. If this is not possible (for example due to blinking during the transition), the annotation should be the next region to be looked at.
To annotate this level, the face camera is primarily used, although the body camera can be useful to validate the annotation
Key | Label | Description |
---|---|---|
0 | left_mirror | The driver is looking at the left outer mirror of the vehicle. |
1 | left | The driver is looking at the left window of the vehicle. |
2 | front | The driver is looking directly in front, through the front window. |
3 | center_mirror | The driver is looking at the center mirror, seeing the back of the vehicle. |
4 | front_right | The driver is looking to the right side of the front window, at the right side of the center mirror. |
5 | right_mirror | The driver is looking at the right outer mirror of the vehicle. |
6 | right | The driver is looking at the windows on the right side of the vehicle. |
7 | infotainment | The driver is looking at the infotainment section of the vehicle, i.e. the location where the radio, temperature settings and the shift knob. |
8 | steering_wheel | The driver is looking at the steering wheel. |
9 | not_valid | Any region where the driver is looking at, that does not correspond to the previous locations and is not a special gaze movement (blinking, or transitioning between 2 states), should be annotated with this annotation. This annotation is also used if the hand-actions are performed, as the gaze is not relevant at that point. |
Diagram | ||
At a transition between regions, try to annotate the current gazing region as much as possible. If this is not possible (for example due to blinking during the transition), the annotation should be the next region to be looked at.
This annotation should be present during a blinking. The blinks usually occur when the driver changes the gazing zone.
To annotate this level, the face camera is primarily used.
Key | Label | Description |
---|---|---|
0 | Blinking | The annotation should go from when the driver starts closing the eyes until they are completely open again. Take into account that some people don't fully close their eyes when blinking; they close half their eyes. |