MNIST_Handwritten_dataset - davidrmiller/neural2d GitHub Wiki

Overview

The MNIST data set consists of 60,000 images of handwritten digits, plus a second set of 10,000 digits that can be used for validation. The following instructions apply to the data set of 60,000 images (we'll ignore the validation data set for now). If you succeed in training on the training data set, you can adapt the same procedure to obtain the set of 10,000 validation images if you need those.

To train neural2d on the MNIST data set, you will need a topology config file, an input data config file, and the 60,000 training images. If all the files are in their right locations, you can start the training with the Makefile. From the directory where the Makefile lives, execute:

 make test-mnist

Or directly invoke the training with the command:

 ./neural2d images/mnist/topology-mnist.txt images/mnist/inputData-mnist.txt weights-mnist.txt

If that fails, see the detailed instructions that follow.

Detailed instructions

  1. Download and uncompress the four database files from http://yann.lecun.com/exdb/mnist/ . Place the four files in the images/mnist/ subdirectory. You should have:

    images/mnist/train-images.idx3-ubyte (47040016 bytes)
    images/mnist/train-labels.idx1-ubyte (60008 bytes)
    images/mnist/t10k-images.idx3-ubyte (7840016 bytes)
    images/mnist/t10k-labels.idx1-ubyte (10008 bytes)

  2. There is a Python script that you can run to extract the 60,000 .bmp files from the MNIST database files. Navigate to the images/mnist subdirectory, and run the Python script ./makeTrainDataForNeural2d.py. This requires Python 3.x. Depending on how your Python interpreter is installed, you may have to use one of the following lines to invoke it:

    ./makeTrainDataForNeural2d.py
    or
    python ./makeTrainDataForNeural2d.py
    or
    python3 ./makeTrainDataForNeural2d.py

If the script succeeds, you will have 60,000 .bmp files in the images/mnist/train-data/ subdirectory, and an input config file named "inputData-mnist.txt" in images/mnist/. You should have:

 images/mnist/train-data/0.bmp      # from 0.bmp through 59999.bmp  
 images/mnist/inputData-mnist.txt  
 images/mnist/topology-mnist.txt

The first few lines of inputData-mnist.txt look like:

 images/mnist/train-data/0.bmp -1 -1 -1 -1 -1 1 -1 -1 -1 -1  
 images/mnist/train-data/1.bmp 1 -1 -1 -1 -1 -1 -1 -1 -1 -1  
 images/mnist/train-data/2.bmp -1 -1 -1 -1 1 -1 -1 -1 -1 -1  
 images/mnist/train-data/3.bmp -1 1 -1 -1 -1 -1 -1 -1 -1 -1  
 images/mnist/train-data/4.bmp -1 -1 -1 -1 -1 -1 -1 -1 -1 1  
 . . .

Now all the files are in their right places, and you should be able to train on the data set by using the Makefile in the top level directory:

 make test-mnist

Or directly invoking neural2d:

 ./neural2d images/mnist/topology-mnist.txt images/mnist/inputData-mnist.txt weights-mnist.txt