COCO - xyfJASON/image-datasets GitHub Wiki
Links
Official website | Papers with Code | Guide
Brief introduction
Copied from paperswithcode.
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Statistics (COCO2017)
Numbers: 163,957
Splits: 118,287 / 5,000 / 40,670 / 123,403 (train / valid / test / unlabeled)
Resolution: Mostly around 640x480
Annotations (copied from paperswithcode):
- object detection: bounding boxes and per-instance segmentation masks with 80 object categories,
- captioning: natural language descriptions of the images (see MS COCO Captions),
- keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle),
- stuff image segmentation: per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff),
- panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road),
- dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.
Usage
File structure
Please organize the downloaded dataset in the following file structure:
root
├── annotations
│ ├── captions_train2017.json
│ ├── captions_val2017.json
│ ├── image_info_test2017.json
│ ├── image_info_test-dev2017.json
│ ├── image_info_unlabeled2017.json
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ ├── panoptic_train2017.json
│ ├── panoptic_train2017
│ │ ├── 000000000009.png
│ │ ├── ...
│ │ └── 000000581929.png
│ ├── panoptic_val2017.json
│ ├── panoptic_val2017
│ │ ├── 000000000139.png
│ │ ├── ...
│ │ └── 000000581781.png
│ ├── person_keypoints_train2017.json
│ ├── person_keypoints_val2017.json
│ ├── stuff_train2017.json
│ ├── stuff_train2017_pixelmaps
│ │ ├── 000000000009.png
│ │ ├── ...
│ │ └── 000000581929.png
│ ├── stuff_val2017.json
│ └── stuff_val2017_pixelmaps
│ ├── 000000000139.png
│ ├── ...
│ └── 000000581781.png
├── train2017
│ ├── 000000000009.jpg
│ ├── ...
│ └── 000000581929.jpg
├── val2017
│ ├── 000000000139.jpg
│ ├── ...
│ └── 000000581781.jpg
├── test2017
│ ├── 000000000001.jpg
│ ├── ...
│ └── 000000581918.jpg
└── unlabeled2017
├── 000000000008.jpg
├── ...
└── 000000581931.jpg