AI_Homework4_Response - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Homework Week 4

3Blue1Brown Chapter 3: Backpropagation

Question 0. In this equation,match the _symbols_ with their meaning:

A.$\sigma$ --> iii. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range?
B.$w_i$ --> iv. Weights from the previous layer to this neuron
C.$a_i$ --> i. Activations from previous layer
D.$b$ --> ii. Bias of this neuron, or threshold for firing

Question 1 Calculate the cost of this "trash" output of a neural network and our desired label of the digit "3"

Answer: 3.3585

Question 2. Suppose we are feeding this image forward through our neural network and want to increase the classification of it as the digit "2". Answer this question about decreasing the cost of the "output-2" node firing:

B: Increase the bias associate with the digit-2 neuron and decrease the bias associated with all the other neurons.

Question 3. What does the phrase "neurons that fire together wire together" mean in the context of increasing weights in layer $L$ in proportion to how they are activated in the previous layer $a_L$ ?

During back-propagation, weight values are adjusted according to how significantly they correctly impacted the resulting nodes. All of the weight contributors to a node that matches the target will experience the most significant value increase

Question 4 The following image shows which of the following...:

changes to the weights of all the neurons "requested" by each training data
changes to the biases of all the neurons "requested" by each training data
changes to the activations of the previous layer
changes to the activation of the output layer

In addition to the answer you chose above, what other choices are changes that backpropagation can actually make to the neural network?

changes to the biases of all the neurons "requested" by each training data

Reasoning: the activation function doesn't change, but the bias values are reevaluated with the weights.

Question 5A. In the reading, calculating the cost function delta $\nabla C$ by mini-batches to find the direction of steepest descent is compared to a...

Selected answer in bold.

a cautious person calculating how to get down a hill *a drunk stumbling quickly down a hill
a cat leaping gracefully down a hill
a bunch of rocks tumbling down a hill

Question 5B. What is the closest analogy to calculating the best update changes $\nabla C$ by mini-batches?

passing laws by electing a new president and waiting for an entire election's paper ballots to be completely counted
asking a single pundit on a television show what laws should be changed
asking a random-sized group of people to make a small chance to any law in the country, repeated $n$ times, allowing a person the possibility to be chosen multiple times
making a small change to one law at a time chosen by random groups of $n$ people, until everyone in the country has been asked at least once

Question 6. If each row in this image is a mini-batch, what is the mini-batch size? Remember in our MNIST `train.py` in last week's lab, the mini-batch size was 10...

Answer:12

Backpropagation Calculus

Question #1. For our neural network with layers of [784, 100, 10], what is the size (number of elements) of the c (cost function changes) matrix below:

(784*100) + (100*10) = 79400 weights (100) + (10) = 110 biases 79400 + 110 =79510 elements

Answer the question again for this smaller neural network:

( )-w1-> ( ) -w2-> ( ) -w3-> ( )

NN with layers: [1, 1, 1, 1] (11) + (11) + (1*1) = 3 weights (1) + (1) + (1) = 3 bias nodes 3 + 3 =6 elements

Question #2. Match the following symbols to their definitions.

A. $a^{(L-1)}$ =i. Activations from the previous layer
B. $\sigma$ = iv. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range
C. $b^{L}$ =ii. Bias of the current layer
D. $w^{L}$ = v. Weights from the previous layer to this neuron
E. $a^{(L)}$ = iii. Activations of the current layer

Question #3 In this tree diagram, we see how calculating the final cost function at the first training image 0 $C_0$ is dependent on the activation of the output layer $a^(L)$. In turn, $a^{(L)}$ is dependent on the weighted output (before the sigmoid function) $z^(L)$, which itself depends on the incoming weights $w^{(L)}$ and activations $a^{(L-1)}$ from the previous layer and the bias of the current layer $b^{L}$. What is the relationship of this second, extended diagram to the first one? Choices: (choose all that apply)

Answers marked in bold

1. There is no relationship
2. The bottom half of the second diagram is the same as the first diagram
3. The second diagram extends backward into the neural network, showing a previous layer $L-2$ whose outputs the layer $L-1$ depends on.
4. The second diagram can be extended further back to layer $L-3$, all the way to the first layer $L$
5. The first diagram is an improved version of the second diagram with fewer dependencies
6. Drawing a path between two quantities in either diagram will show you which partial derivatives to "chain together" in calculating c

Human Writing

...

📚 return to diary homepage... 📖

AI_Homework4_Response - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Homework Week 4

Backpropagation Calculus

Human Writing

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️