AI HW 07 Griffin - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
hw 07
the self attention mechanism computes weights for each to weigh the importance of each token embedding in a sequence.
The goal of attention mechanism is to compute a context vector.
Q. Is a context vector computed for each token?
A. yes:
in the above example, each token embedding has 3 dimensions. A weight is calculated for each embedding using an embedded query token, which is one of the input embeddings.
how I understand this is that a query embedding is chosen (each will have their turn), and weights are calculated using dot product for each input embedding. For example, weight from x2, for input 1 = (0.5 * 0.4) + (0.8 * 0.1) + (0.6 * 0.8)
So here we are creating attention weights. These weights are used to create a context vector for a specific embedding (look at example below).
Q. so does the context vector z2, replace x2 as the input embedding?
Here is my simplified self attention mechanism:
After normalizing the output this is what I get:
Q. Why does one element have a 1 and all of the other elements have a zero?
A.
Question 0: it is the process of translating abstract ideas into concrete data. For example, you could store data about the time of day you play a song, or the type of weather when you play a song.
Question 1:
each day someone wears a different color shirt, depending on the day of the week x is a day of the week y is the color of shirt they wear
Question 2: batch is the number of input samples (iii) max length is the number of tokens per sample (ii) and stride is the number of tokens that are stepped over from one sample to the next. For example, one sample starts at index 0 and the stride is 5, so the second sample starts at index 5 (i)
Question 3: [7, 128]
Question 4:
Question 5: A. 3 B. ii
Human Writing:
Article 1 (thoughts on perma computing): what is perma computing? permaculture focuses on sustainable resources? so perma computing focuses on the energy usage of computers. I think this article can be summed up by the quote: "Instead of planned obsolescence, there should be planned longevity." the article emphasizes the costs of producing computer parts, and how humans should strive to reduce waste, and instead, increase the longevity of our devices.
the first part of the article talks about computer waste, how to reduce/reuse computer parts, and it lays out a blueprint for creating hardware with longevity and reusability in mind.
the second part is about co operation. It brings up two philosophies: yin and yang hacking. Yang focuses on increasing power in every way, forcing an environment into a clean and controlled space (think a corn field with rows of corn). Whereas Yin is about adapting to the machine/environment. Yin tries to find the essence of a computer to understand what it is good at. I like to think of the ugly duckling as an example, but instead, replace the duckling with a robot that doesn't know it's function yet. So the robot goes on a journey to find its function.
The third part shows that sometimes just going forward is not the only way to progress, but instead there are opportunities in every direction/front
There is a part about artificial intelligence. It talks about how we shouldn't be competing with A.I, but instead recognize that we have different minds/capabilities
when windows 11 came out a ton of processors went down in price because they don't have a chip built in to them that is required for windows 11. This chip is a cache for security purposes. Although these chips lack some of the security features, they've been used for a long time and still have plenty of good uses.
I love this article. There is a beauty to junk. Junkyards are filled with potential. I have a raspberry pi specifically for playing retro games. A lot of my friends have build computers for playing games in 4k, which is awesome, but it feels overkill for anything else. Also I feel like because Iphones have so many functions, it's hard to focus when I pick up my phone. It has so many functions that I forget half of them and/or become unfocused to the point it slows me down.
Article 2: this article talks about the incoming energy apocalypse caused by AI's power and water consumption.
The first part is a warning that AI is using too much power, and the second part offers solutions: nuclear fusion, improving efficiency, creating limits on energy consumption
both articles consider the forward progress of AI to be problematic. The first article focuses on computers and reducing our waste/increasing longevity. Whereas the second article focuses on AI. The second article does talk about reducing waste, but it also offers more solutions, specific to the AI problem.
This second article scares me. Humans are already facing energy and water problems. We need to consider how worthwhile those AI generated photos are because they cost a lot.