CelebA - xyfJASON/image-datasets GitHub Wiki

Links

Official website | Papers with Code | Google drive | Baidu drive (password: rp0s)

Brief introduction

Copied from official website.

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including

10,177 number of identities,
202,599 number of face images, and
5 landmark locations, 40 binary attributes annotations per image.

The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face recognition, face detection, landmark (or facial part) localization, and face editing & synthesis.

Statistics

Numbers: 202,599

Splits: 162,770 / 19,867 / 19,962 (train / valid / test)

Resolution:

Aligned: 178×218
In-the-wild: varies from 200+ to 2000+

Annotations:

10,177 number of identities
5 landmark locations per image
40 binary attributes annotations

Files

Anno
- identity_CelebA.txt
- list_attr_celeba.txt
- list_bbox_celeba.txt
- list_landmarks_align_celeba.txt
- list_landmarks_celeba.txt
Eval
- list_eval_partition.txt
Img
- img_celeba.7z (In-The-Wild Images)
- img_align_celeba_png.7z (Align&Cropped Images, PNG Format)
- img_align_celeba.zip (Align&Cropped Images, JPG Format)
README.txt

Usage

Notes: The authors provide two versions of dataset: aligned and in-the-wild. torchvision only supports loading the aligned version.

File structure

Please organize the downloaded dataset in the following file structure:

root
└── celeba
    ├── identity_CelebA.txt
    ├── list_attr_celeba.txt
    ├── list_bbox_celeba.txt
    ├── list_eval_partition.txt
    ├── list_landmarks_align_celeba.txt
    ├── list_landmarks_celeba.txt
    └── img_align_celeba
        ├── 000001.jpg
        ├── ...
        └── 202599.jpg

Example

from torchvision.datasets import CelebA

train_set = CelebA(root='~/data/CelebA', split='train')
valid_set = CelebA(root='~/data/CelebA', split='valid')
test_set = CelebA(root='~/data/CelebA', split='test')
all_set = CelebA(root='~/data/CelebA', split='all')
print(len(train_set))  # 162770
print(len(valid_set))  # 19867
print(len(test_set))   # 19962
print(len(all_set))    # 202599