AI HW 08 Griff - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

https://github.com/TheEvergreenStateCollege/upper-division-cs/wiki/AI%E2%80%90Homework%E2%80%9008

the query is a word and we want to find out how important the other words in the sentence are, related to the query.

init -- initializes weight matrices for W_query, W_key and W_value.

forward -- we compute context vectors by first multiplying queries and keys, which outputs attention scores, then we normalize these scores using softmax. Then we create a context vector by weighing the values using our attention scores as weights

3.5

Causal -- prevents the model from accessing future information in a sequence, i.e. the next word only depends on previous words, not future words.

basically we mask future tokens, and then normalize the unmasked tokens so they add up to 1

to implement a mask look at page 86 of raschka

Multi Head -- run the attention mechanism multiple times (in parallel)?

Dropout -- a technique used to prevent overfitting, where random attention weights are masked during training

Questions:

d_in = 3 the num of dimensions for each token embedding d_out = 2 the num of dimensions for each context vector context_length = 6 the num of tokens considered for each context vector dropout = 0.0 or 0% the percentage of tokens masked when calculating the context vector

b = 2 # number of batches num_tokens = 6 # of tokens per batch d_in = 3 # of dimensions per token

keys
values
queries I believe all of these are the same as x.shape, so they all have dimensions of [2, 6, 3]

so we have queries = [2, 6, 3] and keys = [2, 6, 3], then we transpose (to swap the places of) the second and third elements in keys, i.e. keys = [2, 3, 6], then we matrix multiply them together and get [2, 6, 6]

Human Writing

What is cybernetics? I think that cybernetics is how our bodies communicate within ourselves, i.e. how our brain and nervous system communicate with the rest of the body, and also how machines can replace/enhance and interface with these bodily communication systems.

What is the relationship of cybernetics to artificial intelligence? humans are already like cyborgs in the sense that we use phones to enhance/replace functionality in our own brains, for example, we use calculators when our brains can't compute large or complex operations. Cybernetics studies how the brain intefaces with new technology. Technology like Neurolink that aims to interface directly with the brain.

What is ethics? I accidentally read the text before the question, so I'm not sure what I thought before, but I believe ethics, as they describe in the lecture, is the rules that a person creates for themselves to follow. Many people might have similar lists, and also people are often unethical by their own rules.

What is the cybernetics of cybernetics? it is "the profound insight that a brain is required to write a theory of a brain." In other words, "you reflect upon your reflections." for example, a person may have an idea of how their brain works, but they also reflect on why they have that idea?

What kinds of non-verbal experiences is it possible to have with a large language model that is focused mostly on verbal interactions? you can show an LLM a picture and it can describe what it sees, but it still translates what it sees into text

How can cybernetics affect artificial intelligence, in particular language interactions and recent LLM progress? cybernetics is important to understand how humans and artificial intelligence interface. Humans already see stuff, write it down, upload it to the internet, and then an LLM trains on that; but what if instead a computer is connected directly to a person's brain, so the images they see are translated to words and fed into an LLM.

How can artificial intelligence, in particular LLM progress, affect cybernetics? it brings up the question of what happens when two intelligent beings interface, assuming artificial intelligence is actually intelligence. What is intelligence? imagine a brain and an LLM connected directly. Are we already doing this, but with a slow feedback loop between the brain and LLMs on a computer?