AI_Homework4_Response - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
Link to original assignment
3Blue1Brown Chapter 3: Backpropagation
Question 0. In this equation,match the _symbols_ with their meaning:
- A.$\sigma$ --> iii. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range?
- B.$w_i$ --> iv. Weights from the previous layer to this neuron
- C.$a_i$ --> i. Activations from previous layer
- D.$b$ --> ii. Bias of this neuron, or threshold for firing
Question 1 Calculate the cost of this "trash" output of a neural network and our desired label of the digit "3"
Answer: 3.3585
Question 2. Suppose we are feeding this image forward through our neural network and want to increase the classification of it as the digit "2". Answer this question about decreasing the cost of the "output-2" node firing:
B: Increase the bias associate with the digit-2 neuron and decrease the bias associated with all the other neurons.
Question 3. What does the phrase "neurons that fire together wire together" mean in the context of increasing weights in layer $L$ in proportion to how they are activated in the previous layer $a_L$ ?
During back-propagation, weight values are adjusted according to how significantly they correctly impacted the resulting nodes. All of the weight contributors to a node that matches the target will experience the most significant value increase
Question 4 The following image shows which of the following...:
- changes to the weights of all the neurons "requested" by each training data
- changes to the biases of all the neurons "requested" by each training data
- changes to the activations of the previous layer
- changes to the activation of the output layer
In addition to the answer you chose above, what other choices are changes that backpropagation can actually make to the neural network?
- changes to the biases of all the neurons "requested" by each training data
Reasoning: the activation function doesn't change, but the bias values are reevaluated with the weights.
Question 5A. In the reading, calculating the cost function delta $\nabla C$ by mini-batches to find the direction of steepest descent is compared to a...
Selected answer in bold.
- a cautious person calculating how to get down a hill *a drunk stumbling quickly down a hill
- a cat leaping gracefully down a hill
- a bunch of rocks tumbling down a hill
Question 5B. What is the closest analogy to calculating the best update changes $\nabla C$ by mini-batches?
- passing laws by electing a new president and waiting for an entire election's paper ballots to be completely counted
- asking a single pundit on a television show what laws should be changed
- asking a random-sized group of people to make a small chance to any law in the country, repeated
$n$ times, allowing a person the possibility to be chosen multiple times - making a small change to one law at a time chosen by random groups of $n$ people, until everyone in the country has been asked at least once
Question 6. If each row in this image is a mini-batch, what is the mini-batch size? Remember in our MNIST `train.py` in last week's lab, the mini-batch size was 10...
Answer:12
Question #1. For our neural network with layers of [784, 100, 10], what is the size (number of elements) of the c (cost function changes) matrix below:
(784*100) + (100*10) = 79400 weights (100) + (10) = 110 biases 79400 + 110 =79510 elements
Answer the question again for this smaller neural network:
( )-w1-> ( ) -w2-> ( ) -w3-> ( )
NN with layers: [1, 1, 1, 1] (11) + (11) + (1*1) = 3 weights (1) + (1) + (1) = 3 bias nodes 3 + 3 =6 elements
Question #2. Match the following symbols to their definitions.
- A.
$a^{(L-1)}$ =i. Activations from the previous layer - B.
$\sigma$ = iv. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range - C.
$b^{L}$ =ii. Bias of the current layer - D.
$w^{L}$ = v. Weights from the previous layer to this neuron - E.
$a^{(L)}$ = iii. Activations of the current layer
Question #3 In this tree diagram, we see how calculating the final cost function at the first training image 0 $C_0$ is dependent on the activation of the output layer $a^(L)$ . In turn, $a^{(L)}$ is dependent on the weighted output (before the sigmoid function) $z^(L)$ , which itself depends on the incoming weights $w^{(L)}$ and activations $a^{(L-1)}$ from the previous layer and the bias of the current layer $b^{L}$ . What is the relationship of this second, extended diagram to the first one? Choices: (choose all that apply)
Answers marked in bold
- 1. There is no relationship
- 2. The bottom half of the second diagram is the same as the first diagram
- 3. The second diagram extends backward into the neural network, showing a previous layer $L-2$ whose outputs the layer $L-1$ depends on.
- 4. The second diagram can be extended further back to layer $L-3$, all the way to the first layer $L$
- 5. The first diagram is an improved version of the second diagram with fewer dependencies
- 6. Drawing a path between two quantities in either diagram will show you which partial derivatives to "chain together" in calculating c
...