AI‐24sp‐2024‐04‐17‐Afternoon - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Goals

Today we will progress towards

Training an MNIST classifier through 30 epochs
Tuning it through hyperparameters to increase the success rate from its initial value
Saving the model (its parameters) into a file
Loading the model (its parameters) from a file to validate our model
Understanding stochastic gradient descent (SGD), in preparation for understanding backpropagation next week

0. Change to your prototype directory

In your local development environment or GitPod workspace, start from a clean working directory by committing, pushing, merging any changes that you are currently working on.

Start on main branch and pull the latest changes.

cd <repo_dir>
git checkout main
git pull

You may choose a new prototype-xx, where xx is a number from 01 to 15, that you have not worked on before. You may choose a teammate of your choice or work solo. Change to that directory

cd ai-24sp/prototypes/prototype-xx

Read the README.md file in that directory to understand its current status from the previous team and what needs to be done.

1. Complete your `network.py` and `load_mnist.py`

Make sure you have data downloaded and unzipped from previous labs and that you have a requirements.txt file with any Python packages you need. Currently it should contain

pillow
numpy
pbjson

You can install the packages in this file by running this command:

pip3 install -r requirements.txt

You can list the data files you have and check the sizes. (Your username and group name will look different.)

$ ls -lh *ubyte
-rw-r--r--  1 username  staff   7.5M Jul 21  2000 t10k-images-idx3-ubyte
-rw-r--r--  1 username  staff   9.8K Jul 21  2000 t10k-labels-idx1-ubyte
-rw-r--r--  1 username  staff    45M Apr  2 18:04 train-images-idx3-ubyte
-rw-r--r--  1 username  staff    59K Apr  2 18:04 train-labels-idx1-ubyte

Also make sure that you are gitignoring them so they are not accidentally committed into the monorepo.

$ cat ../.gitignore
*.png
*ubyte
*.gz

You can copy files from this pull request

https://github.com/TheEvergreenStateCollege/upper-division-cs/pull/1372/files

If you save them over files in your prototype directory, you can run git diff to show changes

git diff ./network.py

The diffs may reveal typographical errors.

You should copy over three new files which you can use to do the following tests.

loader.py # utility functions
train.py # this was the old load_mnist.py, which imports utility functions from loader
run_single.py # loads a model and runs it on a single image

Alternative

You may also use files from this repo

https://github.com/MichalDanielDobrzanski/DeepLearningPython.git

If you clone it locally, it will include pickled MNIST data, and you can compare models trained from it with the prototypes we have been working on that train from the Yann Lecun data from scratch.

You may find it useful to create a new file for instantiating and running a neural network called run.py with contents like this

import network
import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
training_data = list(training_data)

net = network.Network([784, 30, 10])
net.SGD(training_data, 50, 10, 0.1, test_data=test_data)

You can then run this to start training to compare

python3 run.py

2. Log your training runs

After you've resolved any differences, when you run train.py you will start training an MNIST classifier with some default hyperparameters:

learning rate (eta) of 5.0
30 epochs, complete runs through all mini batches
a mini-batch size of 10, the number of datapoints from the training data to consider together in order to take one step in SGD

Create a dev diary entry for AI Homework 03, dated today, and log your training runs to show the progress you've made, including the success rate (number of images successfully classified out of the test set of 10,000), and the hyperparameter settings you used to achieve it.

3. Persist your models

Currently, after your training run is over, your model is lost. No matter how high of a success rate you achieved, you can't reuse that classifier or distribute it.

Not to worry, now we will add some lines to your network.py to persist your MNIST classifier to a file, using the package pbjson, or Packed Binary Javascript Object Notation.

Those of you who have taken web programming courses before will recognize the JSON format. This is merely a binary-optimized version of it that saves space through compression.

Add these lines to network.py (without the plus signs or the other diff format characters)

At the end of train.py note the following line

nn.saveToPBJSON("model.pbjson")

where you can change the filename model.pbjson here, or rename it after its been created and you know its success rate.

One example:

mv model.pbjson model-30epochs-10batch-1_0eta.pbjson

In your dev diary, write how big you expect the file to be before you list its size, then include its final size and speculate as to what any discrepancies are due to.

4. Load the model and test on a single image

Load the model you just created by editing the file run_model.py to include

The actual model filename that you saved above
A random index i indicating which image (out of 60,000) you'd like to test.

In the example below, we train on image and label 138 (starting at 0 indexing), and get back a successful result.

$ python3 run_model.py                                                                                                         in bash at Number of images 60000
width 28
height 28
start image data at byte 108208
end image data at byte 108992
length of sliced 1D image data <class 'list'>
Shape of data straight from file (1, 784, 1)
Number of training labels 60000
Saved image 138 as mnist.png
/Users/cryptogoth/src/upper-division-cs/ai-24sp/prototypes/prototype-00/network.py:174: RuntimeWarning: overflow encountered in exp
  return 1.0/(1.0+np.exp(-z))
Classified image 138 as 5
Label 138 is b'\x05'

You can view that particular training image in the file mnist.png (this will get overwritten every time).

However, this model only had a 72% success rate, so on a different example we see a difference (an '8' image was classified as a 7).

$ python3 run_model.py                                                                                                        in bash at 19:26:59
Number of images 60000
width 28
height 28
start image data at byte 107424
end image data at byte 108208
length of sliced 1D image data <class 'list'>
Shape of data straight from file (1, 784, 1)
Number of training labels 60000
Saved image 137 as mnist.png
/Users/cryptogoth/src/upper-division-cs/ai-24sp/prototypes/prototype-00/network.py:174: RuntimeWarning: overflow encountered in exp
  return 1.0/(1.0+np.exp(-z))
Classified image 137 as 7
Label 137 is b'\x08'

Optional note: For a proper validation, you want to hold back a portion of your training data as "validation data", and you would measure your model's success rate as a percentage of validation data classified correctly.

5. Exchange models with another team

You can publish your .pbjson file to exchange with another team by creating a release

https://github.com/TheEvergreenStateCollege/upper-division-cs/releases

You can use the tag mnist-unoptimized-models

You can also post in the Discord channel, attach to an email, or another method. Committing and pushing to the class monorepo is discouraged, as this is a generated file and many of your classmates may be generating similar ones.

Try a model from another team and see if it can classify some of the same images yours can.

https://github.com/TheEvergreenStateCollege/upper-division-cs/releases/tag/mnist-unoptimized-models

Print out the model's sizes, num_layers to see that they are the same, and that the weights and biases members are the same shape, but will contain different values.

Why are they different, if they are trained from the same data?

Include responses to these prompts in your dev diary.

6. Stretch goal 1: implement an optimization

If you finish the previous steps, read and implement one of the optimizations described in Mike Nielsen's Chapter 3 notes.

Any increase in the success rate from your original model above is beneficial, no matter how small.

Describe which optimization you chose in your dev diary, and any challenges you encounter, including screenshots, error stack traces and code snippets.

7. Stretch goal 2: load a model and continue SGD training

Try modifying run_model.py so that after loading a model, you continue training it with SGD.

That way, you can have checkpoints for very long training sessions, and you can improve a model from another team.

If you do, you can re-publish the improved model as a new file release on GitHub.