Riley AI - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
HW-1 ✔️
The case for CGI resurrection is a tricky one. On one hand, I think it could theoretically be a great way to pay homage to an actor. I say theoretically because I can’t think of a single instance where an actor’s likeness was recreated post-mortem with CGI and it not be ominous. An example of this is the use of it for the scene that paid homage to actress Nancy Marchand in the series The Sopranos, who passed away during the third season. The scene acts as a farewell to the character, but this intention is spoiled by the uncanny-valley effect caused by the early 2000’s CGI.While it’s good in theory, in reality the use of CGI can never fully replicate the actor that is being revived. For this reason, and for the fact that it’s impossible to know whether the actor would agree to being recreated, I believe it shouldn’t be done. Even if these actors were to sign away the use of their likeness before their death, they can’t agree to a specific use of it.There are other methods to pay homage to an actor than literally recreating their physicality.
Here is a link to the MNIST classifier I made:
https://github.com/TheEvergreenStateCollege/upper-division-cs/pull/1551
I already made a MNIST classifier in prototype-01, but I coded another as I went through the reading because I wanted to understand it better
Lab-3 ✔️
So I followed most of the instructions with some shortcuts that I found were easier. I recorded 5 minutes of a food review from Reviewbrah (Report of the Week). I installed TTS while I was recording the 5 minutes of the video. I just used a free converter online to convert the .webm to a .ogg. I then just dragged the .ogg into my codespace and ran it through sox to get rid of the silent portions. I ran the TTS with the converted .wav and was able to get decently convincing voice synthesis. I used his intro in the first test, and one of his iconic sayings "My disappointment is immeasurable... and my day is ruined". I didn't realize how much he emphasizes certain words in his intro, which I attempted to recreate with the text, but didn't end up nailing it. Also, he was outside for the review, which added a lot of background noise to the recording. Next time I will use a review that is filmed inside to minimize that.

Questions:
Has listening to your chosen voice produced an emotional reaction for you? Say as much as you're comfortable.
Not really, I imagine it would be more emotional had I done my own voice/someone I know.
Has it changed your ethical considerations that you wrote about at the beginning of the lab?
I already had reservations about how ethical voice synthesis was, and this kind of reinforced my feeling that it can be really dangerous depending on who's voice you are synthesizing.
What would you better like to understand about the voice cloning process?
How deep the neural network goes, how the audio files compare to the handwriting png's
HW-3 ✔️
Technical Reading-
Connectionism and distributed representation is the use of inputs that are represented by many features, and these features are representational of a variety of inputs. It uses neurons that activate depending on different combinations, which is similar to the MNIST classification. These approaches are similar because instead of just taking the entire digit and determining what it is from its entirety, it instead categorizes different common features that can be used in determining what the input is.
-
More data is collected, meaning bigger datasets and more resources for deep learning algorithms. Also there are more computational resources to run larger models.
-
The human brain has 10^11 or 100000000000 neurons, while the biggest AI supercomputer OMP-1 has around 3162277 (10^6.5?). I googled the latter question, and learned that the Sequoia supercomputer can simulate 530 billion neurons, which is less than 1% of the information processing capacity of a human.
-
Because weights and biases are initialized to random weights and biases before training.
-
(1,3,4)
-
[(7, 13) (2, 13)] for self.weights and [(1,13), (1, 2)] for self.bias
-
D > C > A > B
-
The larger the learning rate means faster training, but also means that you may overshoot the minimum and it may take longer to reach it.
-
Stochastic comes from the the fact that the path towards the cost minimum is not as direct as in gradient descent. These two differ in that SGD doesn’t accumulate weight updates like in GD optimization.
Human Writing:
How do the essays hang together?
Don’t know what this means
Is 4’s essay, which responds to exactly the same prompt, an improvement on that of 3.5?
I’d say so, at least GPT4 provides more analysis and understands that “bring truth to a gunfight” isn’t literal.
What do you think of ChatGPT as a student writer? Would you want to use ChatGPT (or other AI) for an assignment like this?
I don’t really have a strong opinion on the use of ChatGPT as a student writer. I personally don’t use it but I know a lot of people do. I don’t use ChatGPT on any written assignment since I think it’s kind of counterintuitive to higher education.
If you did, how would you use it? For a first draft? To help edit a first draft you wrote yourself? Would you just submit ChatGPT’s version as is, maybe ‘making it your own’ a bit by changing a few words or adding a few things of your own?
As a programmer, I use it to explain code that I may not understand. It’s pretty good at explaining coding concepts/functions.
If you would use ChatGPT in any way, would you do that merely for convenience, or do you think it would contribute to your development as a thinker and academic writer?”
In the context of my usage of it, I think it benefits me by providing me with convenient answers to specific problems.
-
I think framing in journalism is kind of the bias of a story, whether the story is viewed positively or negatively, or what its purpose is.
-
Framing relates to system prompts because there is human influence and opinion in framing, which is replicated in system prompts.
-
The most common and relevant information is going to be expressed by both?
-
Because you are receiving help from both.
Programming MNIST Classifier:
Jonah and I worked on prototype-3
Lab-4 ✔️
Output 1: Counting total words and printing the first hundred words
Total number of character: 120284
The Project Gutenberg eBook of Betty Crocker picture cooky book
This ebook is for the use of
Output 2: Splitting on whitespace and punctuation marks
['\ufeffThe', 'Project', 'Gutenberg', 'eBook', 'of', 'Betty', 'Crocker', 'picture', 'cooky', 'book', 'This', 'ebook', 'is', 'for', 'the', 'use', 'of', 'anyone', 'anywhere', 'in', 'the', 'United', 'States', 'and', 'most', 'other', 'parts', 'of', 'the', 'world']
Output 3: Assigning vocabulary words to token IDs
('!', 0)
('"', 1)
('#72443]', 2)
('$1', 3)
('$5', 4)
('(', 5)
(')', 6)
('***', 7)
('*2', 8)
('*4', 9)
('*In', 10)
('*Or', 11)
('*⅓', 12)
(',', 13)
('-', 14)
('--', 15)
('.', 16)
('000', 17)
('1', 18)
('1-cup', 19)
('10', 20)
('11', 21)
('12', 22)
('128', 23)
('12″', 24)
('13', 25)
('14', 26)
('14-15', 27)
('15', 28)
('15-oz', 29)
('1500', 30)
('16', 31)
('16-21', 32)
('17', 33)
('175', 34)
('18', 35)
('19', 36)
('1939', 37)
('1948', 38)
('1¼', 39)
('1¼″', 40)
('1½', 41)
('1½″', 42)
('1¾', 43)
('1″', 44)
('1⅓', 45)
('1⅔', 46)
('1⅛', 47)
('1⅜', 48)
('2', 49)
('20', 50)
Output 4: Encoding a sentence from the dataset
[3375, 1025, 654, 1854, 2462, 301, 437, 2583, 1662, ... 16]
Output 5: Round-trip decoding back to a sentence from token IDs
"The Project Gutenberg eBook of Betty Crocker picture cooky book ..."
Output 6: Encoding a new test sentence not from the dataset
Traceback (most recent call last):
File "/workspaces/upper-division-cs/ai-24sp/assignments/rilesbe/week5/tokenizer.py", line 135, in <module>
tokenizer.encode(text)
File "/workspaces/upper-division-cs/ai-24sp/assignments/rilesbe/week5/tokenizer.py", line 14, in encode
ids = [self.str_to_int[s] for s in preprocessed]
File "/workspaces/upper-division-cs/ai-24sp/assignments/rilesbe/week5/tokenizer.py", line 14, in <listcomp>
ids = [self.str_to_int[s] for s in preprocessed]
KeyError: 'Hello'
Output 7: Adding meta-tokens to end of vocabulary
('\ufeffThe', 3375)
('<|endoftext|>', 3376)
('<|unk|>', 3377)
Output 8: Encoding a sentence using both known and unknown tokens
"The Project Gutenberg eBook of Betty Crocker picture cooky book ..."
HW-4 ✔️
0.
A = iii
B = iv
C = i
D =ii
1. 4.2216
2. Increase the bias associated with the digit-2 neuron and decrease the biases associated with all the other neurons.
3. The biggest increases in weights (strengthening of connections) happen between the most active neurons (or the ones that could be the most active)
4. changes to the weights of all the neurons "requested" by each training data.
Possible changes by backpropagation include:
changes to the activations of the previous layer, changes to the biases of all the neurons "requested" by each training data
5. a drunk stumbling quickly down a hill
making a small change to one law at a time chosen by random groups of people, until everyone in the country has been asked at least once
6. 12
7. (784 * 100) + (100 * 10)
Would it be 3 since that is the number of weights?
8.
A = Activations from the previous layer
B = Sigmoid, or squashing function, to smooth outputs to the 0.0 to 1.0 range
C = Bias of the current layer
D = Weights from the previous layer to this neuron
E = Activations of the current layer
9. 2, 3, 4, 6
Human Writing
Both of these instances of AI are used on people who may not agree to have their likeness recreated. I don’t think that any safeguard would sway somebody from using recordings to clone public figures. I do believe there should be punishment for using this technology to produce misinformation, or using it for fraudulent purposes.
I used a public figure’s voice instead of my own, so I still don’t really know what it would feel like having my voice recreated with AI. My hesitance to use my own voice in the first place may be telling of how I feel and would feel about it. I do believe it should be my right to recreate my voice if I want. In the situation where I could no longer speak, it may be beneficial to have a way to recreate my voice. But of course this could be used negatively to make me say things I wouldn’t. I think families should have the right to create AI simulated personalities of deceased family members, but I don’t know if you should actually do that. I don't think there is much of a difference between interacting with an AI generated personality of the past and reenactors/ reenactments. I do believe there are some parts of history that are hard to relive, which introduces some issues of ethicality.
HW-5 ✔️
Chapter 1:
-
An LLM, a large language model, is a neural network designed to understand, generate, and respond to human-like text A GPT, a generative pre-trained transformer, is a type of large language model and a prominent framework for generative artificial intelligence. They can be synonymous.
-
We believe ABCD most closely resembles a question and answer model, in particular StackOverflow, Reddit, and Quora since they are most commonly used for getting questions answered. They have multiple labels for data points and they can be ranked by getting upvoted, liked, or even verified as being correct by the website.
-
C
-
3 or more layers
-
Yes
a. T, pre-training uses a large diverse dataset. b. F, swapped definitions. c. T, if the pre-fine-tuned model is available. d. T, pretraining generates labels itself, categorization can be added for certain tasks. e. T, Pretraining uses more data than fine-tuning f. T, Fine-tuning requires a pre-trained model.
-
A, B, E?
-
Tokenizer, it takes in input and breaks it into different sections, like the images that were used to determine the handwritten number.
-
B, I think this is the only one that isn't based on any data prior, it just generates.
-
Generation without prior exampels.
-
GPT-3 has 96 transformer layers and 175 billion parameters in total
Chapter 2:
-
Text is categorical, which isn't compatible with math operations that are used to train neural networks.
-
An embedding is a mapping from words, images, etc. to points in a continuous vector space. Dimensions capture relationships between embeddings, so an embedding can have many embeddings.
-
256, 12288
-
C, B, D, A
Human Writing:
I think my general philosophy when sourcing a data set will take into consideration the subject matter of the data. I think that the data set should be used respectfully, meaning that data sets including personal data shouldn’t be used when not permitted to. In cases where human metrics are sold by companies, my philosophical beliefs will tend to go against using data sets that contain that data. Past experiences of data breaches and news where a person’s identity was compromised due to their data containing identifiable information has led me to this philosophy. David Deautsch seems in favor of AI use as a tool, and draws many parallels to the ways in which humans learn to how AI learns. He didn’t really say anything I thought that affected my philosophy on AI use.
I think using AI to generate things that haven’t been created before is fascinating. I saw a Twitch streamer use AI to create new cooking recipes, and then he actually followed the recipes to create the AI generated dish. The outcome always came out bad, since taste isn’t something that AI can understand. I think using AI in ways that are fun, and in ways that point out their incapability at performing certain tasks are really interesting, which may be the opposite of how most people view AI, and why AI is being seen as such an important technology. Another interesting use of AI is in creativity. I have had a couple of fun experiences of using AI to create new worlds, or asking how our current lives would be different had certain historical events not taken place, or had they.
I think that in some circumstances, AI use can be creative. At just the surface level of use, I don’t believe that AI can create anything ground-breaking or new. But when you are really able to use AI in a way that digs deep with prompts that are extremely detailed and pulls from ideas that haven’t been connected yet, I think that AI could be used to create some cool ideas. This forces the question, are humans creative? Aren't our new ideas just recreated old-ideas? I believe that David Deautsch has a similar or adjacent definition of creativity, saying that once you define creativity, you have already confined it. I think that instead of asking if AI is creative or not, we should be asking how AI creativity is any different from human creativity. I think for AI to be intelligent it has to have some creative capability. I believe that creativity is important, but it shouldn’t be viewed as just creating a completely original idea, because I don’t think that there are any completely original ideas that aren’t in some way influenced by other concepts. I think generally any use of creativity that produces hate or prejudice are examples of the negative effect of creative freedom. Just as much creativity is capable of producing good ideas, it is just as capable of producing bad ones.
HW-6 ✔️
1. Blue - EncryptionRed - Decryption
<|unk|> - token for words not in vocabulary
<|endoftext|> - start/end of segment
-
<|endoftext|>, you can tell how many original documents were in the dataset by analyzing how many of these special tokens there are.
It lets the GPT learn connections between the long word and shorter words based on common parts, like "pneumonia", "microscope", or "volcano".
Advantage, actual language learning, can be applied to different words with same common parts.
This approach can work in any language, not just English.
Advantage, same as above, but would need to specify a language
All words are broken down in the same way, so the process is deterministic and results from repeated runs will be similar.
Disadvantage, makes it harder to produce dissimilar words.
The system will handle any word it encounters in chat or inference, even if it has never seen the word during training.
Disadvantage for typos
It is easier than looking up the word in a hashtable, not finding it, and using a single <|unk|> unknown special token.
Advantage, less special tokens
- Common words have smaller integers, less common have longer ones. Words that can be split into parts receive their own IDs. They are not always separated with a space from the token before them. I would write a function that would look up the token to see if it is a word, if it isn’t it would split the word into sub parts to determine what words are present in sequence.
Human Writing:
The case of Hachette v. Internet Archive relates to our training data sets for the final project because we are using literary works that are available on the internet. There are many sites that provide free ebooks, the one I am using for my final project is a Betty Crocker cookbook I found on Project Gutenberg. The similarity comes in that the Internet Archive is providing a similar service. I am assuming that the books on Project Gutenberg are in the public domain, which may not be the case with the Internet Archive.
I believe the main point for the publishers is that they make no profit from the works they publish, since their publications are being accessed for free, the exact same as a library. They feel as though there is a difference between a book being physical and digital. When a book is physically in a library and people can access it for free, it is fine because it is likely going through a government agency, and they likely receive some financial gain from providing books to libraries. When this process is done digitally without the government oversight providing them financial gain, it becomes a problem because they are gasp providing information for free.
The counter point, in defense of the Internet Archive (which I have an extreme bias towards) do not see a difference between a physical library and the service they are providing. It is unsure the methods in which the Internet Archive are procuring their digital versions of cultural artifacts, but it is very likely that someone pays for them, since they only allow one person to have possession of them at a time. Despite this, they are being attacked by publishing houses for doing the exact same thing a physical library does. They view themselves as being in the right because there is some double-standard they view as being apparent.
This whole case reminds me of the overall copyright system that is in place to protect the intellectual property of a company or person. It’s about the possession of an idea, or atleast gatekeeping the ability for anyone else to make money off of an idea. The publishing houses view themselves as in the right because of systems like intellectual property, they do not want others to benefit off of the thing that makes them money, because this in turn makes them lose profit if someone is accessing their product without initially buying it.
I wouldn’t be able to act as a judge because my decision-making would not be supported by the law. I am fully aware that the publishing houses are likely in the right and the verdict in the end will be in favor of them. There will have to be some form of legislation in order to protect the rights of internet libraries to function in order for them to do it legally, which will provide publishing houses with the financial motivation to provide their books for them.
If I were to train a GPT based on the work of my classmates, ethical questions like: do they consent to having their work used and in some sense, replicated, would come up.
5/16/24 Thursday Lab ✔️
2.6 Sampling
PyTorch version: 2.2.2+cu121
Total number of characters: 120284
The Project Gutenberg eBook of Betty Crocker picture cooky book
This ebook is for the use of a
torch.Size([8, 4, 256])
Total number of characters: 120284
The Project Gutenberg eBook of Betty Crocker picture cooky book
This ebook is for the use of a
tiktoken version: 0.6.0
39158
x: [198, 10919, 15485, 13]
y: [10919, 15485, 13, 921]
[198] ----> 10919
[198, 10919] ----> 15485
[198, 10919, 15485] ----> 13
[198, 10919, 15485, 13] ----> 921
----> what
what ----> soever
whatsoever ----> .
whatsoever. ----> You
[tensor([[171, 119, 123, 464]]), tensor([[ 119, 123, 464, 4935]])]
[tensor([[ 119, 123, 464, 4935]]), tensor([[ 123, 464, 4935, 20336]])]
Inputs:
tensor([[ 171, 119, 123, 464],
[ 4935, 20336, 46566, 286],
[29504, 9325, 15280, 4286],
[ 4255, 88, 1492, 198],
[ 220, 220, 220, 220],
[ 198, 1212, 47179, 318],
[ 329, 262, 779, 286],
[ 2687, 6609, 287, 262]])
Targets:
tensor([[ 119, 123, 464, 4935],
[20336, 46566, 286, 29504],
[ 9325, 15280, 4286, 4255],
[ 88, 1492, 198, 220],
[ 220, 220, 220, 198],
[ 1212, 47179, 318, 329],
[ 262, 779, 286, 2687],
[ 6609, 287, 262, 1578]])
2.7 Embedding
Parameter containing:
tensor([[ 0.3374, -0.1778, -0.1690],
[ 0.9178, 1.5810, 1.3010],
[ 1.2753, -0.2010, -0.1606],
[-0.4015, 0.9666, -1.1481],
[-1.1589, 0.3255, -0.6315],
[-2.8400, -0.7849, -1.4096]], requires_grad=True)
tensor([[ 1.2753, -0.2010, -0.1606],
[-0.4015, 0.9666, -1.1481],
[-2.8400, -0.7849, -1.4096],
[ 0.9178, 1.5810, 1.3010]], grad_fn=<EmbeddingBackward0>)
2.8 Positional
Total number of characters: 120284
The Project Gutenberg eBook of Betty Crocker picture cooky book
This ebook is for the use of a
Total number of characters: 120284
The Project Gutenberg eBook of Betty Crocker picture cooky book
This ebook is for the use of a
torch.Size([8, 4, 256])
Token IDs:
tensor([[ 171, 119, 123, 464],
[ 4935, 20336, 46566, 286],
[29504, 9325, 15280, 4286],
[ 4255, 88, 1492, 198],
[ 220, 220, 220, 220],
[ 198, 1212, 47179, 318],
[ 329, 262, 779, 286],
[ 2687, 6609, 287, 262]])
Inputs shape:
torch.Size([8, 4])
torch.Size([8, 4, 256])
torch.Size([4, 256])
torch.Size([8, 4, 256])
3.3.1 Untrainable
tensor([0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865])
tensor(0.9544)
tensor(0.9544)
Attention weights: tensor([0.1455, 0.2278, 0.2249, 0.1285, 0.1077, 0.1656])
Sum: tensor(1.0000)
Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581])
Sum: tensor(1.)
Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581])
Sum: tensor(1.)
tensor([0.4419, 0.6515, 0.5683])
3.3.2 Trainable
tensor([[0.9995, 0.9544, 0.9422, 0.4753, 0.4576, 0.6310],
[0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865],
[0.9422, 1.4754, 1.4570, 0.8296, 0.7154, 1.0605],
[0.4753, 0.8434, 0.8296, 0.4937, 0.3474, 0.6565],
[0.4576, 0.7070, 0.7154, 0.3474, 0.6654, 0.2935],
[0.6310, 1.0865, 1.0605, 0.6565, 0.2935, 0.9450]])
tensor([[0.9995, 0.9544, 0.9422, 0.4753, 0.4576, 0.6310],
[0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865],
[0.9422, 1.4754, 1.4570, 0.8296, 0.7154, 1.0605],
[0.4753, 0.8434, 0.8296, 0.4937, 0.3474, 0.6565],
[0.4576, 0.7070, 0.7154, 0.3474, 0.6654, 0.2935],
[0.6310, 1.0865, 1.0605, 0.6565, 0.2935, 0.9450]])
tensor([[0.2098, 0.2006, 0.1981, 0.1242, 0.1220, 0.1452],
[0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581],
[0.1390, 0.2369, 0.2326, 0.1242, 0.1108, 0.1565],
[0.1435, 0.2074, 0.2046, 0.1462, 0.1263, 0.1720],
[0.1526, 0.1958, 0.1975, 0.1367, 0.1879, 0.1295],
[0.1385, 0.2184, 0.2128, 0.1420, 0.0988, 0.1896]])
Row 2 sum: 1.0
All row sums: tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
tensor([[0.4421, 0.5931, 0.5790],
[0.4419, 0.6515, 0.5683],
[0.4431, 0.6496, 0.5671],
[0.4304, 0.6298, 0.5510],
[0.4671, 0.5910, 0.5266],
[0.4177, 0.6503, 0.5645]])
Previous 2nd context vector: tensor([0.4419, 0.6515, 0.5683])
HW-7 ✔️
Question 0. Which of the following vectors is a one-hot encoding?
The vector to the right is a one-hot encoding.
How would you describe a one-hot encoding in English?
Converting categorical information into a format that can be used in a learning algorithm.
Question 1. What is an (x,y) training example (in English)?
Hint: In MNIST, x is a 784-pixel image, and y is a single character label from 0 to 9.
An (x,y) training example is represented as a tuple where x is the data as a whole, and y is a single label in that set.
Question 2. We call large texts for training a GPT "self-labeling" because we can sample from the
text in a sliding window (or batches of words).
Match the following terms (A,B,C) with its definition below (1,2,3):
A. max_length in = ii. chunk size, or number of token IDs to group together into one x or y of a training example (x,y)
B. stride = i. the number of token IDs to "slide" forward from one (x,y) training example to the next (x,y) training example
C. batch size = iii. the number of (x,y) training examples returned in each call to next of our dataloader's iterator.
Question 3. Because embedding is stored as a matrix, and we studied how neural network weights can also be stored
in a matrix, we can view the operation of transforming an input vector into an embedding as a two-layer neural network.
For example, this neural network has layers of [4,3] meaning 4 nodes in the first layer and 3 nodes in the 2nd layer.
We ignore biases for now, or assume they are biases of all zeros.
The weights for the above neural network are a matrix that takes column vectors of size 4 to column vectors of size 3. Using the rules of matrix multiplication, what is the size of this weights matrix?
7 x 128
Question 4. In the above problem, we can treat the input to this [4,3] neural network as a singletoken ID (as a one-hot encoding) that we wish to convert to its embedding (in a higher-dimensional feature space). To embed a batch of 8 chunks, we form a matrix from the column vectors of each chunk, and multiply that by the embeddings matrix. If the embeddings matrix goes from a vocabulary of size 6 to an output dimension of 12, what is the shape of the output matrix when we embed a batch of 8 chunks?
8 x 12
Question 5. A) Which token ID did you get an embedding for? (Remember it is 0-based indexing) B) Which of the following is true? i) Your vocabulary has 4 token IDs in it, and the embedding output dimension is 7 ii) Your vocabulary has 7 token IDs in it, and the embedding output dimension is 4 iii) Both iv) Neither
3, ii
Human Writing:
How would you summarize the main point or thesis of each article?
-
The first article introduces the idea of perma-computing, which I had never heard of. It seems to be summarized as nature doing most of the work, while humans design, build and maintain it.
-
The second article discusses the massive impact that AI can have on the environment, and insists that more needs to be done by both their makers to make AI more sustainable, as well as by legislators to incentivize sustainable energy.
How would you divide and expand the thesis of each article above into two or three main parts? That is: for the permacomputing article, how would you summarize its main sections?
- The permacomputing article discusses several ideas of autonomous systems between humans and technology. It's separated into topics of energy consumption of technology, computers need to be able to be observable, and how to design and maintain these computers.
for the AI energy article, how would you summarize its main sections?
- This article summarizes recent events regarding AI's massive energy consumption, as well as discussing what is being done and what needs to be done in order to make AI sustainable.
What are two pieces of evidence or arguments that each article provides to support their thesis?
-
Permacomputing: Evidence 1 Evidence 2
-
AI: Evidence 1 Evidence 2
What is a related piece of evidence or arguments that you've found independently (e.g. through reading / watching the news, search engines)?
-
Permacomputing: I found this article on the actual website for permacomputing, and it gives a good explanation and introduction to the concept.
-
AI: I found this article also in support of the original article, discussing the same concern of AI's both energy and water consumption.
How would you describe the overall attitude of each article?
- I would describe the first article as more hypothetical or describing the ideals of how tech should be. I would describe the second as more based on reality. I still offers hypothetical solutions and "how things should be", but mainly focuses on things that are actually happening.
How would you describe the approach each article takes?
- This first article describes how things should be, and the second article describes how things are.
To what extent do you agree or disagree with the thesis of each article, as you've stated it above?
- Both articles seem to be progressive in terms of sustainability and how technology isn't sustainable and that it should be. I agree with both.
Do you find the pieces of evidence or arguments that you provide convincing? Why or why not?
- I find both of the pieces of evidence I provided convincing, while I think that permacomputing seems a little over-ambitious and over-idealic, I still agree with it.
HW-8 ✔️
AI Reading and Questions
3_4_1_manual_attention.py
tensor([0.8509, 1.6635])
keys.shape: torch.Size([6, 2])
values.shape: torch.Size([6, 2])
tensor(3.4720)
tensor([2.6322, 3.4720, 3.4191, 1.9135, 1.5041, 2.5524])
tensor([0.1526, 0.2764, 0.2662, 0.0918, 0.0687, 0.1442])
tensor([0.9890, 0.7616])
3_4_2_compact_class.py
tensor([[1.1324, 0.2969],
[1.1891, 0.3058],
[1.1888, 0.3058],
[1.1468, 0.2991],
[1.1562, 0.3007],
[1.1533, 0.3001]], grad_fn=<MmBackward0>)
tensor([[0.2183, 0.0421],
[0.2211, 0.0403],
[0.2212, 0.0401],
[0.2214, 0.0381],
[0.2235, 0.0356],
[0.2202, 0.0400]], grad_fn=<MmBackward0>)
3_5_1_causal_mask.py
tensor([[1.0815, 0.7990],
[1.0917, 0.8059],
[1.0915, 0.8058],
[1.0673, 0.7900],
[1.0737, 0.7936],
[1.0712, 0.7929]], grad_fn=<MmBackward0>)
tensor([[-0.0739, 0.0713],
[-0.0748, 0.0703],
[-0.0749, 0.0702],
[-0.0760, 0.0685],
[-0.0763, 0.0679],
[-0.0754, 0.0693]], grad_fn=<MmBackward0>)
tensor([[0.1921, 0.1646, 0.1652, 0.1550, 0.1721, 0.1510],
[0.2041, 0.1659, 0.1662, 0.1496, 0.1665, 0.1477],
[0.2036, 0.1659, 0.1662, 0.1498, 0.1664, 0.1480],
[0.1869, 0.1667, 0.1668, 0.1571, 0.1661, 0.1564],
[0.1830, 0.1669, 0.1670, 0.1588, 0.1658, 0.1585],
[0.1935, 0.1663, 0.1666, 0.1542, 0.1666, 0.1529]],
grad_fn=<SoftmaxBackward0>)
tensor([[1., 0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0.],
[1., 1., 1., 1., 1., 1.]])
tensor([[0.1921, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.2041, 0.1659, 0.0000, 0.0000, 0.0000, 0.0000],
[0.2036, 0.1659, 0.1662, 0.0000, 0.0000, 0.0000],
[0.1869, 0.1667, 0.1668, 0.1571, 0.0000, 0.0000],
[0.1830, 0.1669, 0.1670, 0.1588, 0.1658, 0.0000],
[0.1935, 0.1663, 0.1666, 0.1542, 0.1666, 0.1529]],
grad_fn=<MulBackward0>)
tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.5517, 0.4483, 0.0000, 0.0000, 0.0000, 0.0000],
[0.3800, 0.3097, 0.3103, 0.0000, 0.0000, 0.0000],
[0.2758, 0.2460, 0.2462, 0.2319, 0.0000, 0.0000],
[0.2175, 0.1983, 0.1984, 0.1888, 0.1971, 0.0000],
[0.1935, 0.1663, 0.1666, 0.1542, 0.1666, 0.1529]],
grad_fn=<DivBackward0>)
tensor([[0.2899, -inf, -inf, -inf, -inf, -inf],
[0.4656, 0.1723, -inf, -inf, -inf, -inf],
[0.4594, 0.1703, 0.1731, -inf, -inf, -inf],
[0.2642, 0.1024, 0.1036, 0.0186, -inf, -inf],
[0.2183, 0.0874, 0.0882, 0.0177, 0.0786, -inf],
[0.3408, 0.1270, 0.1290, 0.0198, 0.1290, 0.0078]],
grad_fn=<MaskedFillBackward0>)
tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.5517, 0.4483, 0.0000, 0.0000, 0.0000, 0.0000],
[0.3800, 0.3097, 0.3103, 0.0000, 0.0000, 0.0000],
[0.2758, 0.2460, 0.2462, 0.2319, 0.0000, 0.0000],
[0.2175, 0.1983, 0.1984, 0.1888, 0.1971, 0.0000],
[0.1935, 0.1663, 0.1666, 0.1542, 0.1666, 0.1529]],
grad_fn=<SoftmaxBackward0>)
3_5_2_dropout.py
tensor([[0.7475, 0.3051],
[0.7687, 0.3144],
[0.7678, 0.3139],
[0.7419, 0.3035],
[0.7337, 0.2963],
[0.7533, 0.3097]], grad_fn=<MmBackward0>)
tensor([[ 0.0308, -0.2223],
[ 0.0332, -0.2235],
[ 0.0338, -0.2238],
[ 0.0391, -0.2271],
[ 0.0466, -0.2317],
[ 0.0334, -0.2237]], grad_fn=<MmBackward0>)
tensor([[0.1636, 0.1570, 0.1584, 0.1700, 0.1958, 0.1552],
[0.1651, 0.1585, 0.1597, 0.1690, 0.1912, 0.1564],
[0.1653, 0.1589, 0.1600, 0.1689, 0.1901, 0.1568],
[0.1660, 0.1622, 0.1629, 0.1680, 0.1800, 0.1609],
[0.1676, 0.1671, 0.1670, 0.1661, 0.1657, 0.1666],
[0.1648, 0.1586, 0.1598, 0.1692, 0.1909, 0.1568]],
grad_fn=<SoftmaxBackward0>)
tensor([[1., 0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0.],
[1., 1., 1., 1., 1., 1.]])
tensor([[0.1636, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.1651, 0.1585, 0.0000, 0.0000, 0.0000, 0.0000],
[0.1653, 0.1589, 0.1600, 0.0000, 0.0000, 0.0000],
[0.1660, 0.1622, 0.1629, 0.1680, 0.0000, 0.0000],
[0.1676, 0.1671, 0.1670, 0.1661, 0.1657, 0.0000],
[0.1648, 0.1586, 0.1598, 0.1692, 0.1909, 0.1568]],
grad_fn=<MulBackward0>)
tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.5102, 0.4898, 0.0000, 0.0000, 0.0000, 0.0000],
[0.3413, 0.3282, 0.3305, 0.0000, 0.0000, 0.0000],
[0.2518, 0.2461, 0.2471, 0.2550, 0.0000, 0.0000],
[0.2010, 0.2005, 0.2004, 0.1993, 0.1988, 0.0000],
[0.1648, 0.1586, 0.1598, 0.1692, 0.1909, 0.1568]],
grad_fn=<DivBackward0>)
tensor([[-0.2365, -inf, -inf, -inf, -inf, -inf],
[-0.1865, -0.2441, -inf, -inf, -inf, -inf],
[-0.1782, -0.2339, -0.2238, -inf, -inf, -inf],
[-0.1030, -0.1356, -0.1297, -0.0855, -inf, -inf],
[ 0.0203, 0.0161, 0.0157, 0.0080, 0.0044, -inf],
[-0.1898, -0.2437, -0.2334, -0.1524, 0.0181, -0.2604]],
grad_fn=<MaskedFillBackward0>)
tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.5102, 0.4898, 0.0000, 0.0000, 0.0000, 0.0000],
[0.3413, 0.3282, 0.3305, 0.0000, 0.0000, 0.0000],
[0.2518, 0.2461, 0.2471, 0.2550, 0.0000, 0.0000],
[0.2010, 0.2005, 0.2004, 0.1993, 0.1988, 0.0000],
[0.1648, 0.1586, 0.1598, 0.1692, 0.1909, 0.1568]],
grad_fn=<SoftmaxBackward0>)
next
tensor([[0., 0., 0., 2., 2., 0.],
[2., 2., 2., 2., 0., 2.],
[0., 0., 2., 2., 0., 0.],
[0., 2., 0., 2., 0., 2.],
[0., 0., 2., 2., 2., 0.],
[2., 0., 2., 2., 0., 0.]])
tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.6827, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.5037, 0.0000, 0.0000, 0.5099, 0.0000, 0.0000],
[0.0000, 0.4009, 0.4008, 0.3986, 0.3976, 0.0000],
[0.0000, 0.3172, 0.0000, 0.3384, 0.0000, 0.0000]],
grad_fn=<MulBackward0>)
3_5_3_causal_class.py
torch.Size([2, 6, 3])
true tensor([[[ 0.0415, -inf, -inf, -inf, -inf, -inf],
[-0.0368, -0.1403, -inf, -inf, -inf, -inf],
[-0.0419, -0.1344, -0.1363, -inf, -inf, -inf],
[-0.0152, -0.1017, -0.1036, -0.0560, -inf, -inf],
[-0.1216, 0.0100, 0.0134, 0.0158, 0.0720, -inf],
[ 0.0326, -0.1628, -0.1674, -0.0944, -0.2025, -0.0556]],
[[ 0.0415, -inf, -inf, -inf, -inf, -inf],
[-0.0368, -0.1403, -inf, -inf, -inf, -inf],
[-0.0419, -0.1344, -0.1363, -inf, -inf, -inf],
[-0.0152, -0.1017, -0.1036, -0.0560, -inf, -inf],
[-0.1216, 0.0100, 0.0134, 0.0158, 0.0720, -inf],
[ 0.0326, -0.1628, -0.1674, -0.0944, -0.2025, -0.0556]]],
grad_fn=<MaskedFillBackward0>)
after tensor([[[ 0.0415, -inf, -inf, -inf, -inf, -inf],
[-0.0368, -0.1403, -inf, -inf, -inf, -inf],
[-0.0419, -0.1344, -0.1363, -inf, -inf, -inf],
[-0.0152, -0.1017, -0.1036, -0.0560, -inf, -inf],
[-0.1216, 0.0100, 0.0134, 0.0158, 0.0720, -inf],
[ 0.0326, -0.1628, -0.1674, -0.0944, -0.2025, -0.0556]],
[[ 0.0415, -inf, -inf, -inf, -inf, -inf],
[-0.0368, -0.1403, -inf, -inf, -inf, -inf],
[-0.0419, -0.1344, -0.1363, -inf, -inf, -inf],
[-0.0152, -0.1017, -0.1036, -0.0560, -inf, -inf],
[-0.1216, 0.0100, 0.0134, 0.0158, 0.0720, -inf],
[ 0.0326, -0.1628, -0.1674, -0.0944, -0.2025, -0.0556]]],
grad_fn=<MaskedFillBackward0>)
tensor([[[ 0.2548, -0.1170],
[ 0.3255, -0.1794],
[ 0.3464, -0.2005],
[ 0.3217, -0.1864],
[ 0.2670, -0.1758],
[ 0.2892, -0.1746]],
[[ 0.2548, -0.1170],
[ 0.3255, -0.1794],
[ 0.3464, -0.2005],
[ 0.3217, -0.1864],
[ 0.2670, -0.1758],
[ 0.2892, -0.1746]]], grad_fn=<UnsafeViewBackward0>)
context_vecs.shape: torch.Size([2, 6, 2])
When we call this class later in the section, what exact numbers do we give for each parameter value?
d_in
- 3
d_out
- 2
context_length
- 6
dropout
- Unknown
Give the values of the variables on line 16:
b
- 2
num_tokens
- 6
d_in
- 3
Give the shape of the variables on lines 18-20:
keys *2 x 6 x 2
values *2 x 6 x 6
queries *2 x 6 x 6
Give the shape of the variable on line 22
attn_scores *2 x 6 x 2 (?)
and line 25 attn_weights
- Don't know
Exercise 3.2*
torch.Size([2, 6, 3])
true tensor([[[ 0.1143, -inf, -inf, -inf, -inf, -inf],
[ 0.2433, 0.3102, -inf, -inf, -inf, -inf],
[ 0.2413, 0.3057, 0.2955, -inf, -inf, -inf],
[ 0.1402, 0.1980, 0.1899, 0.1253, -inf, -inf],
[ 0.1385, 0.1384, 0.1366, 0.0701, 0.0651, -inf],
[ 0.1674, 0.2514, 0.2401, 0.1637, -0.0316, 0.2814]],
[[ 0.1143, -inf, -inf, -inf, -inf, -inf],
[ 0.2433, 0.3102, -inf, -inf, -inf, -inf],
[ 0.2413, 0.3057, 0.2955, -inf, -inf, -inf],
[ 0.1402, 0.1980, 0.1899, 0.1253, -inf, -inf],
[ 0.1385, 0.1384, 0.1366, 0.0701, 0.0651, -inf],
[ 0.1674, 0.2514, 0.2401, 0.1637, -0.0316, 0.2814]]],
grad_fn=<MaskedFillBackward0>)
after tensor([[[ 0.1143, -inf, -inf, -inf, -inf, -inf],
[ 0.2433, 0.3102, -inf, -inf, -inf, -inf],
[ 0.2413, 0.3057, 0.2955, -inf, -inf, -inf],
[ 0.1402, 0.1980, 0.1899, 0.1253, -inf, -inf],
[ 0.1385, 0.1384, 0.1366, 0.0701, 0.0651, -inf],
[ 0.1674, 0.2514, 0.2401, 0.1637, -0.0316, 0.2814]],
[[ 0.1143, -inf, -inf, -inf, -inf, -inf],
[ 0.2433, 0.3102, -inf, -inf, -inf, -inf],
[ 0.2413, 0.3057, 0.2955, -inf, -inf, -inf],
[ 0.1402, 0.1980, 0.1899, 0.1253, -inf, -inf],
[ 0.1385, 0.1384, 0.1366, 0.0701, 0.0651, -inf],
[ 0.1674, 0.2514, 0.2401, 0.1637, -0.0316, 0.2814]]],
grad_fn=<MaskedFillBackward0>)
tensor([[[-0.1263, -0.4881],
[-0.2705, -0.5097],
[-0.3103, -0.5140],
[-0.3055, -0.4562],
[-0.2315, -0.4149],
[-0.2854, -0.4101]],
[[-0.1263, -0.4881],
[-0.2705, -0.5097],
[-0.3103, -0.5140],
[-0.3055, -0.4562],
[-0.2315, -0.4149],
[-0.2854, -0.4101]]], grad_fn=<UnsafeViewBackward0>)
context_vecs.shape: torch.Size([2, 6, 2])
true tensor([[[0.3111, -inf, -inf, -inf, -inf, -inf],
[0.1655, 0.2602, -inf, -inf, -inf, -inf],
[0.1667, 0.2602, 0.2577, -inf, -inf, -inf],
[0.0510, 0.1080, 0.1064, 0.0643, -inf, -inf],
[0.1415, 0.1875, 0.1863, 0.0987, 0.1121, -inf],
[0.0476, 0.1192, 0.1171, 0.0731, 0.0477, 0.0966]],
[[0.3111, -inf, -inf, -inf, -inf, -inf],
[0.1655, 0.2602, -inf, -inf, -inf, -inf],
[0.1667, 0.2602, 0.2577, -inf, -inf, -inf],
[0.0510, 0.1080, 0.1064, 0.0643, -inf, -inf],
[0.1415, 0.1875, 0.1863, 0.0987, 0.1121, -inf],
[0.0476, 0.1192, 0.1171, 0.0731, 0.0477, 0.0966]]],
grad_fn=<MaskedFillBackward0>)
after tensor([[[0.3111, -inf, -inf, -inf, -inf, -inf],
[0.1655, 0.2602, -inf, -inf, -inf, -inf],
[0.1667, 0.2602, 0.2577, -inf, -inf, -inf],
[0.0510, 0.1080, 0.1064, 0.0643, -inf, -inf],
[0.1415, 0.1875, 0.1863, 0.0987, 0.1121, -inf],
[0.0476, 0.1192, 0.1171, 0.0731, 0.0477, 0.0966]],
[[0.3111, -inf, -inf, -inf, -inf, -inf],
[0.1655, 0.2602, -inf, -inf, -inf, -inf],
[0.1667, 0.2602, 0.2577, -inf, -inf, -inf],
[0.0510, 0.1080, 0.1064, 0.0643, -inf, -inf],
[0.1415, 0.1875, 0.1863, 0.0987, 0.1121, -inf],
[0.0476, 0.1192, 0.1171, 0.0731, 0.0477, 0.0966]]],
grad_fn=<MaskedFillBackward0>)
tensor([[[-0.4519, 0.2216],
[-0.5874, 0.0058],
[-0.6300, -0.0632],
[-0.5675, -0.0843],
[-0.5526, -0.0981],
[-0.5299, -0.1081]],
[[-0.4519, 0.2216],
[-0.5874, 0.0058],
[-0.6300, -0.0632],
[-0.5675, -0.0843],
[-0.5526, -0.0981],
[-0.5299, -0.1081]]], grad_fn=<CatBackward0>)
context_vecs.shape: torch.Size([2, 6, 2])
Human Writing
What is cybernetics?
- I am not really sure what cybernetics is, as they do not give a very clear or singular definition of what it is. Gregory Bateson says:"Cybernetics is a branch of mathematics dealing with problems of control, recursiveness and information". Another definition says:"cybernetics arises when effectors (say, a motor, an engine, our muscles, etc.) are connected to a sensory organ which in turn acts with its signals upon the effectors. It is this circular organization which sets cybernetic systems apart from others that are not so organized".
What is the relationship of cybernetics to artificial intelligence?
- I would say they are related in their replication of human features. It seems cybernetics is when effectors(like our body) is connected to a sensory organ, which acts from the effectors. AI is the attempt to generate human intelligence, by using neurons to generate decision making and make connections between data. These are both focused on our relationship with technology, and trying to adapt our bodies to be like technology and vice versa.
What is ethics... ...in your personal understanding before reading this essay?
- Ethics are the way people determine right from wrong. There are certain things that we as humans determine are right and wrong for us to do in the context of personal belief, religion, and general societal taboo.
...as described in this essay?
- This reading is extremely hard for me to read. Between all of the rambling, he says "ethics does not become explicit and that language does not degenerate into moralizations". I barely understand what this means, but it gives a definition of ethics.
As far as your personal view is similar to or different from the meaning of ethics implied in the essay, say more about how your view developed
- My views didn't develop because I could barely understand what they were saying. Between the french poems and tangents, this essay was frustrating to read so I wasn't able to get a good sense of what definition of ethics were being implied in this essay.