MNIST - xyfJASON/image-datasets GitHub Wiki
Links
Official website | Papers with Code
Brief introduction
Copied from paperswithcode.
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Statistics
Numbers: 70,000
Splits (train / test): 60,000 / 10,000
Resolution: 28×28
Labels:10 classes
Usage
File structure
Please organize the dataset in the following file structure:
root
└── MNIST
└── raw
├── t10k-images-idx3-ubyte
├── t10k-images-idx3-ubyte.gz
├── t10k-labels-idx1-ubyte
├── t10k-labels-idx1-ubyte.gz
├── train-images-idx3-ubyte
├── train-images-idx3-ubyte.gz
├── train-labels-idx1-ubyte
└── train-labels-idx1-ubyte.gz
Example
from torchvision.datasets import MNIST
train_set = MNIST(root='~/data/MNIST', train=True)
test_set = MNIST(root='~/data/MNIST', train=False)
print(len(train_set)) # 60000
print(len(test_set)) # 10000