Griffin AI Homework 03 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

https://www.deeplearningbook.org/contents/intro.html

  1. What is connectionism and the distributed representation approach? How does it relate to the MNIST classification of learning the idea of a circle, regardless of whether it is the top of the digit 9 or the top / bottom of the digit 8?

connectionism is what deep learning used to be called. The distributed representation approach is an approach to classification where something is classified based on the matching features it has. An example from the reading: "animals become intelligent when many of their neurons work together. So for the example of an 8 and a 9, the neurons that make up the first circle would fire for both, however, the neurons they don't share would also fire affecting the next layers.

  1. What are some factors that has led to recent progress in the ability of deep learning to mimic intelligent human tasks?
  • the amount of training data has increased
  • the algorithms were too computationally costly, at the time
  1. How many neurons are in the average human brain, versus the number of simulated neurons in the biggest AI supercomputer described in the book chapter? Now in the year 2024, how many neurons can the biggest supercomputer simulate? (You may use a search engine or an AI chat itself to speculate).

humans: ~86 billion computers: 10-1000?

it says that computers won't catch up till around 2050


Gradient Descent --

https://www.3blue1brown.com/lessons/gradient-descent

Cost function: a value that describes the difference between the expected output and the output that we get. A higher cost function means that our output is vastly different than our expected output. In this way, we can see how far off our models are. image image

We want to reduce our cost function, i.e. to find an input that finds the minima output.

to do this, take an input and find the slope. If the slope is negative, then shift the input to the right (increase it), and if the slope is positive shift to the left.

image image

However, even if you find the minima point, it isn't guaranteed to be the lowest valley. image

Backpropagation --

Stocastic Gradient Descent --

  1. a network takes time to learn. Its weights and biases will be unadjusted to begin with. Without training data to adjust it, the network has no idea how to categorize inputs.

  2. (1, 3, 4)

  3. self.weights = (6, 13, 2) self.biases = (5, 13, 2)

  4. Changing the learning rate will effect the size of each step in Stochastic Gradient Descent. A higher learning rate will update the weights faster, however it may step completely over a local minima.

  5. in Stochastic gradient descent we take a single input, run it through the model and find the difference between the target and reality. This difference is out cost function. We use it to update all parameters (weights and biases).

Whereas in gradient descent, we take all of the inputs, run them through the model. The cost function is an average of all of the differences. Then we update the weights and biases 1 time based on the average we calculated.

Human Writing:

  1. As a writing tutor I think schools shouldn't care if students use chatgpt. Most people don't need to know rules of grammar. They are going to follow a template that their teacher gave them anyways. I'd write out my answers to a question, then ask chatgpt to organize/structure them. Then I'd ask chat gpt to fix any grammar issues. As long as the substance originated from the student, then I don't care what tools they used to reach that point. It's our job as the current generations to build tools for future generations, so their lives are easier. If we don't allow tools like chatgpt, then we are failing those next generations.

  2. framing is when you take a story and only show the parts that paint the picture you want. Framing crops out a lot of details and nuance, and can be used to control a narrative.

  3. the system prompt is like the frame. It's the context that the AI is given. If the AI is given a biased view then it'll be biased.

  4. if AI are fed the standard story, then that is what it knows, and will relay.

  5. the information the AI is telling the person is biased. I'm not actually sure how this would effect programming languages, but I imagine the AI may have "preferences" for the syntax/structure of each language.