AI‐24sp‐2024‐04‐17‐Afternoon - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
Today we will progress towards
- Training an MNIST classifier through 30 epochs
- Tuning it through hyperparameters to increase the success rate from its initial value
- Saving the model (its parameters) into a file
- Loading the model (its parameters) from a file to validate our model
- Understanding stochastic gradient descent (SGD), in preparation for understanding backpropagation next week
In your local development environment or GitPod workspace, start from a clean working directory by committing, pushing, merging any changes that you are currently working on.
Start on main
branch and pull the latest changes.
cd <repo_dir>
git checkout main
git pull
You may choose a new prototype-xx
, where xx
is a number from 01
to 15
, that you have not worked on before.
You may choose a teammate of your choice or work solo. Change to that directory
cd ai-24sp/prototypes/prototype-xx
Read the README.md file in that directory to understand its current status from the previous team and what needs to be done.
Make sure you have data downloaded and unzipped from previous labs
and that you have a requirements.txt
file with any Python packages you need. Currently it should contain
pillow
numpy
pbjson
You can install the packages in this file by running this command:
pip3 install -r requirements.txt
You can list the data files you have and check the sizes. (Your username and group name will look different.)
$ ls -lh *ubyte
-rw-r--r-- 1 username staff 7.5M Jul 21 2000 t10k-images-idx3-ubyte
-rw-r--r-- 1 username staff 9.8K Jul 21 2000 t10k-labels-idx1-ubyte
-rw-r--r-- 1 username staff 45M Apr 2 18:04 train-images-idx3-ubyte
-rw-r--r-- 1 username staff 59K Apr 2 18:04 train-labels-idx1-ubyte
Also make sure that you are gitignoring them so they are not accidentally committed into the monorepo.
$ cat ../.gitignore
*.png
*ubyte
*.gz
You can copy files from this pull request
https://github.com/TheEvergreenStateCollege/upper-division-cs/pull/1372/files
If you save them over files in your prototype directory, you can run git diff
to show changes
git diff ./network.py
The diffs may reveal typographical errors.
You should copy over three new files which you can use to do the following tests.
loader.py # utility functions
train.py # this was the old load_mnist.py, which imports utility functions from loader
run_single.py # loads a model and runs it on a single image
You may also use files from this repo
https://github.com/MichalDanielDobrzanski/DeepLearningPython.git
If you clone it locally, it will include pickled MNIST data, and you can compare models trained from it with the prototypes we have been working on that train from the Yann Lecun data from scratch.
You may find it useful to create a new file for instantiating and running a neural network called run.py
with contents like this
import network
import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
training_data = list(training_data)
net = network.Network([784, 30, 10])
net.SGD(training_data, 50, 10, 0.1, test_data=test_data)
You can then run this to start training to compare
python3 run.py
After you've resolved any differences, when you run train.py
you will start
training an MNIST classifier with some default hyperparameters:
- learning rate (
eta
) of 5.0 - 30 epochs, complete runs through all mini batches
- a mini-batch size of 10, the number of datapoints from the training data to consider together in order to take one step in SGD
Create a dev diary entry for AI Homework 03, dated today, and log your training runs to show the progress you've made, including the success rate (number of images successfully classified out of the test set of 10,000), and the hyperparameter settings you used to achieve it.
Currently, after your training run is over, your model is lost. No matter how high of a success rate you achieved, you can't reuse that classifier or distribute it.
Not to worry, now we will add some lines to your network.py
to persist your MNIST
classifier to a file, using the package pbjson
, or Packed Binary Javascript Object Notation.
Those of you who have taken web programming courses before will recognize the JSON format. This is merely a binary-optimized version of it that saves space through compression.
Add these lines to network.py
(without the plus signs or the other diff format characters)

At the end of train.py
note the following line
nn.saveToPBJSON("model.pbjson")
where you can change the filename model.pbjson
here,
or rename it after its been created and you know its success rate.
One example:
mv model.pbjson model-30epochs-10batch-1_0eta.pbjson
In your dev diary, write how big you expect the file to be before you list its size, then include its final size and speculate as to what any discrepancies are due to.
Load the model you just created by editing the file run_model.py
to include
- The actual model filename that you saved above
- A random index
i
indicating which image (out of 60,000) you'd like to test.
In the example below, we train on image and label 138 (starting at 0 indexing), and get back a successful result.
$ python3 run_model.py in bash at Number of images 60000
width 28
height 28
start image data at byte 108208
end image data at byte 108992
length of sliced 1D image data <class 'list'>
Shape of data straight from file (1, 784, 1)
Number of training labels 60000
Saved image 138 as mnist.png
/Users/cryptogoth/src/upper-division-cs/ai-24sp/prototypes/prototype-00/network.py:174: RuntimeWarning: overflow encountered in exp
return 1.0/(1.0+np.exp(-z))
Classified image 138 as 5
Label 138 is b'\x05'
You can view that particular training image in the file mnist.png
(this will get overwritten every time).
However, this model only had a 72% success rate, so on a different example we see a difference (an '8' image was classified as a 7
).
$ python3 run_model.py in bash at 19:26:59
Number of images 60000
width 28
height 28
start image data at byte 107424
end image data at byte 108208
length of sliced 1D image data <class 'list'>
Shape of data straight from file (1, 784, 1)
Number of training labels 60000
Saved image 137 as mnist.png
/Users/cryptogoth/src/upper-division-cs/ai-24sp/prototypes/prototype-00/network.py:174: RuntimeWarning: overflow encountered in exp
return 1.0/(1.0+np.exp(-z))
Classified image 137 as 7
Label 137 is b'\x08'
Optional note: For a proper validation, you want to hold back a portion of your training data as "validation data", and you would measure your model's success rate as a percentage of validation data classified correctly.
You can publish your .pbjson
file to exchange with another team by creating a release
https://github.com/TheEvergreenStateCollege/upper-division-cs/releases
You can use the tag mnist-unoptimized-models
You can also post in the Discord channel, attach to an email, or another method. Committing and pushing to the class monorepo is discouraged, as this is a generated file and many of your classmates may be generating similar ones.
Try a model from another team and see if it can classify some of the same images yours can.
https://github.com/TheEvergreenStateCollege/upper-division-cs/releases/tag/mnist-unoptimized-models
Print out the model's sizes
, num_layers
to see that they are the same,
and that the weights
and biases
members are the same shape, but will contain different values.
Why are they different, if they are trained from the same data?
Include responses to these prompts in your dev diary.
If you finish the previous steps, read and implement one of the optimizations described in Mike Nielsen's Chapter 3 notes.
Any increase in the success rate from your original model above is beneficial, no matter how small.
Describe which optimization you chose in your dev diary, and any challenges you encounter, including screenshots, error stack traces and code snippets.
Try modifying run_model.py
so that after loading a model,
you continue training it with SGD
.
That way, you can have checkpoints for very long training sessions, and you can improve a model from another team.
If you do, you can re-publish the improved model as a new file release on GitHub.