1904.04882.md - hassony2/inria-research-wiki GitHub Wiki
{notes} {paper} {project page} {code, Keras}
Contextual Attention for Hand Detection in the Wild, ICCV'19Supreeth Narasimhaswamy†, Zhengwei Wei†, Yang Wang, Justin Zhang, Minh Hoai
Objective
Perform hand detection in the wild (but looks like mostly in 3rd person view)
Release datasets for this task
Datasets
- TV-Hand dataset
- from ActionThread dataset (4757 videos: 1521 train/1514 test)
- includes one to two frames from each video. Training data contains
images from 2433 videos, validation from 810 videos, test from 1514
videos.
- 9498 images: 4853 training/ 1618 validation / 3027 test
- number of hands : 4085 train / 1362 validation / 3199 test
- images height : 360 pixels
- Oxford dataset
- Automatically annotate a subset of Microsoft’s COCO dataset
- 26,499 images with 45,671 hands
- COCO-Hand-S final verification step to identify images with good and complete annotations, 4534 images with 10,845 hands
Method
Undoubtedly, the availability of this large-scale dataset is one reason for the impressive performance of our hand detector.
Experiments
Improvements upon mask-rcnn baseline on 2 datasets: Oxford-Hand: 69.9% --> 73.0% TV-Hand: 59.9% --> 60.3%
So baseline is pretty strong and might be faster ?