Ai Homework Torsten - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
In this part we were assigned prototypes with a partner. our group did prototype 02. you can find our work with this path in our class repo:
ai24-sp/prototypes/prototype-02
In the 2016 film "Rogue One" a character named Grand Moff Tarkin appears. The actor who played the character whose name was Peter Cushing passed away in 1994. However, in the the 2016 film, the people who made Rogue One wanted to bring Tarkin back to life using CGI. A company called Tyburn Film Productions says they should have control over whether or not Tarkin comes back to life in movies. They say they made a deal back in 1993 that gives them this control. But the people who made "Rogue One" and Peter Cushing's family disagree. They say they had the right to bring Tarkin back to life in the movie because of some agreements made before. They argue that they didn't do anything wrong by bringing Tarkin back.
This big case is all about who has the right to to decide what happens to a character in a movie or show, especially if the actor is no longer alive. If I was an actor myself, I would hate companies using ai to make me look alive again and use me when I am not alive. But that is just my opinion, I would not know how Cushing would feel about it, maybe some people are ok with it, and others are not which makes this topic difficult.
The reading talks about artificial intelligence (AI), to be more specific, a focus on deep learning which involves building complex concepts from simpler ones in a layered graph structure. In the beginning of the reading, it mentioned a successful Ai achievement which involved IBM’s Deep Blue chess-playing system defeating world champion Garry Kasparov in 1997. AI systems need the ability to acquire their own knowledge, by extracting patterns from raw data. This capability is known as machine learning. Machine learning allowed computers to tackle problems involving knowledge of the real world and make decisions that appear subjective. A simple machine learning algorithm called logistic regression can determine whether to recommend cesarean delivery (Mor-Yosef et al., 1990). Another simple machine learning algorithm called naive Bayes can separate legitimate e-mail from spam e-mail. How these machines perform depends on how much data they are given.
This After, Duck and I trained prototype 04. Before we began the training, we had to add train.py
, run_model.py
, and load_mnist
to our prototype 04 directory. Below is the result from our training:
Number of images 60000
width 28
height 28
start image data at byte 16
end image data at byte 47040016
length of sliced 1D image data <class 'list'>
Shape of data straight from file (60000, 784, 1)
Training image shape (60000, 784, 1)
Number of training labels 60000
Training labels shape (60000, 10, 1)
Number of images 10000
width 28
height 28
start image data at byte 16
end image data at byte 7840016
length of sliced 1D image data <class 'list'>
Shape of data straight from file (10000, 784, 1)
Number of training labels 10000
/workspace/upper-division-cs/ai-24sp/prototypes/prototype-04/network.py:137: RuntimeWarning: overflow encountered in exp
return 1.0/(1.0+np.exp(-z))
Epoch 0: 4020 / 10000
Epoch 1: 5030 / 10000
Epoch 2: 5107 / 10000
Epoch 3: 5453 / 10000
Epoch 4: 5970 / 10000
Epoch 5: 5978 / 10000
Epoch 6: 6129 / 10000
Epoch 7: 6135 / 10000
Epoch 8: 6240 / 10000
Epoch 9: 6163 / 10000
Epoch 10: 6265 / 10000
Epoch 11: 6356 / 10000
Epoch 12: 6787 / 10000
Epoch 13: 6906 / 10000
Epoch 14: 6861 / 10000
Epoch 15: 6906 / 10000
Epoch 16: 7168 / 10000
Epoch 17: 7321 / 10000
Epoch 18: 7119 / 10000
Epoch 19: 7475 / 10000
Epoch 20: 7539 / 10000
Epoch 21: 7354 / 10000
Epoch 22: 7672 / 10000
Epoch 23: 7472 / 10000
Epoch 24: 7540 / 10000
Epoch 25: 7661 / 10000
Epoch 26: 7684 / 10000
Epoch 27: 7664 / 10000
Epoch 28: 7661 / 10000
Epoch 29: 7658 / 10000
Epoch 30: 7540 / 10000
Epoch 31: 7546 / 10000
Did not finish because it was too slow and I had to go to work.
Connectionism in terms of this class is the study of how the neurons in a neural network connect, meaning how they work together. Distributed representation is a way of encoding information in a system where each input is represented by many features, and each feature is involved in the representation of many possible inputs. For example in the text, it talked about a system that can recognize cars, trucks, and birds. "One way of representing these inputs would be to have a separate neuron or hidden unit that activates for each of the nine possible combinations: red truck, red car, redbird, green truck, and so on. This requires nine different neurons, and each neuron must independently learn the concept of color and object identity. One way to improve on this situation is to use a distributed representation, with three neurons describing the color and three neurons describing the object identity. This requires only six neurons total instead of nine, and the neuron describing redness is able to learn about redness from images of cars, trucks and birds, not just from images of one specific category of objects." -Deep learning book
A factor that has led to recent progress in the ability of deep learning to mimic intelligent human tasks is the growth of neural networks. older neural networks were trained on smaller amounts of datasets. Now with faster computers with larger memories and the availability of larger datasets, larger networks are able to achieve higher accuracy on more complex tasks. This trend looks like it is set to continue for decades. Also the availability of GPUs and CPUs has helped make this models more powerful.
The average human brain contains around 86 billion neurons.
4 )the neural network outputs "trash" or something that is far from right is because our model is not trained, it was just using random weights and biases assigned. To fix the models performance, we need a cost function which calculates how bad our network is.
5 ) (1, 3, 4)
6 ) [(13,7),(2,13)]
7 ) D > C > A > B
8 ) When you change the learning rate such as to a larger rate, you will get bigger steps, you could find your minimum faster. However, with such bigger steps, you could also overshoot your minimum.
9 ) In stochastic gradient descent, the gradient is computed using only one randomly chosen example from the dataset at a time. "stochastic" means randomly determined which gives it the name stochastic gradient descent because it computes random examples from a data set.
Question 0:
A. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range? B. Weights from the previous layer to this neuron C. Activations from previous layer D. Bias of this neuron, or threshold for firing
Question 1:
The cost of this "trash" output of a neural network and our desired label of the digit "3" is 3.3585
Question 2:
Since we want to increase the activation of the digit-2 output neuron we've been discussing, how should its associated bias be adjusted? What about the biases associated with all the other output neurons for this training example?
Increase the bias associated with the digit-2 neuron and decrease the biases associated with all the other neurons.
Question 3:
The phrase "neurons that fire together wire together" means the connections between certain neurons gets stronger.
Question 4:
Answer: changes to the weights of all the neurons "requested" by each training data
Question 5:
answer: a drunk stumbling quickly down a hill
Question 6:
If each row in this image is a mini-batch, what is the mini-batch size?
I would say 12 or 100, I counted 12 boxes in each row but this quote"a bunch of mini-batches, having, say, 100 training examples each." from the reading makes me think 100 too. I also don't get the question really either.
Question #2
a) Activations from the previous layer b) Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range c) Bias of the current layer d) Weights from the previous layer to this neuron e) Activations of the current layer
Question #3
-
The bottom half of the second diagram is the same as the first diagram
-
The second diagram extends backward into the neural network, showing a previous layer L-2 whose outputs the layer L-1depends on
-
Drawing a path between two quantities in either diagram will show you which partial derivatives to "chain together" in calculating gradient c
What thoughts or feelings do you have right now, before hearing your synthesized voice?
Years ago I would have never imagined I would be able to clone my voice and have it say whatever I would like. I think while it sounds cool from a broad perspective, it can used for bad things.
Should you have the right to synthesize your own voice? What are possible advantages or misuses of it?
I think a person should have the right to synthesize their voice if they have ethic considerations. Some advantages to using your voice could be to use it for presentations. I hate presentations, so using something that sounds like me to do my presentation sounds cool, but I know others will have different opinions on that matter. Some misuses of cloning peoples voices is that it can be used make someone say something they would not say. For example, it could lead to misinformation in politics such as a supporter for one candidate cloning the other candidates voice and making them say something horrific and post it on the internet hoping the other candidate loses supporters. So Ai voice cloning could lead to manipulation.
Should family members have the right to create AI simulated personalities of the deceased?
personally, once I pass I would not like my voice to be played with or brought back. Nor would I want an AI simulated personality of someone I know or loved because it would not make me feel better in anyway because I know it's not really them. I think families should have the right to do what is ethical to them for deceased loved ones whether that is letting them rest or "bringing them back to life" to try and bring joy.
-
A GPT and an LLM are both types of large language models but have different architectures and training methods, and applications. GPTs are trained using unsupervised learning on large amounts of text data. They learn to predict the next word in a sequence given the previous words.
-
d,Posts on Quora and their replies
-
The GPT architecture in the paper "Attention is All You Need" was originally designed for which task
answer: C, Machine translation from one human language to another
-
In the Rashka text, three or more layers in a neural network is considered deep learning (3)
-
Our MNIST classifier is a deep learning neural network by this definition above because it has 4 layers. An input layer, two hidden layers, and then an output layer.
-
a) True, Pre-training LLMs requires access to significant resources and is very expensive. Not only is it expensive, it takes time from weeks to months where as fine-tuning for a specific task will take less time.
b) False, Pre-training is typically done on large amounts of unlabeled text data, while fine-tuning often involves using meticulously labeled data for specific tasks.
c) True, A model can be fine-tuned by different people than the ones who originally pre-trained a model. There are many pretrained LLMs, available as open-source models, can be used as general purpose tools to write, extract, and edit texts that were not part of the training data. Also, LLMs can be finetuned on specific tasks with relatively smaller datasets, reducing the computational resources needed and improving performance on the specific task(13).
d) True, Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks
e) True, Fine-tuning usually uses less data than pre-training
f) True, Pre-training can produce a model from scratch, but fine-tuning can only change an existing model.
-
A,The existing words in sentences it has already produced in the past. B,Prompts from the user. So A and B.
-
Chat with ChatGPT: https://chat.openai.com/share/2b8f32fa-6ab0-4fc2-be80-c91143120e25
-
C. Decoder, the translates from a higher-dimensional vector space of features back to words. The GPT is a decoder and used for generative and predictive purposes which is what Our MNIST classifier did by predicting a number 0-9 given an input of a hand written digit.
-
C, an example of zero-shot learning that we have encountered in this class is Using ChatGPT or a similar AI chat to answer a question it has never seen before with no examples. I can just ask gpt anything without giving it examples but providing examples would provide a better answer from GPT.
-
Zero-shot learning is the ability for the model to generalize to completely unseen tasks without any prior specific examples. With few shot leanring, the model is provided a few examples.
-
gpt 3 has 175 billion parameters
-
LLMs cannot operate on words directly is because raw text is categorical, so it isn't compatible with the mathematical operations used to implement and train neural networks (19). We need a way to trun text into numbers for the computer to understand which is embedding.
-
Embedding is the process of turning data such as raw text into a vector format. In other words, into numbers that the computer understands better than raw text.
-
GPT-3 uses uses an embedding size of 12,288 dimensions.
-
Put the following steps in order for processing and preparing our dataset for LLM training
A. Adding position embeddings to the token word embeddings B. Giving unique token IDs (numbers) to each token. C. Breaking up natural human text into tokens, which could include punctuation, whitespace, and special "meta" tokens like "end-of-text" and "unknown" D. Converting token IDs to their embeddings, for example, using Word2Vec
A: A-> C -> B -> D
Q: What is a use of AI that you are interested in, either that you've seen publicly or that you've considered as a thought experiment? A: Originally, I wanted to do game development as a carrier more particularly programming. But with the growth of ai, I got inspired and impressed by chatbots such as ChatGPT and other cool things Ai can do from watching some documentaries. A documentary I enjoyed watching was the show NOVA on PBS. I don't watch this show but they recently launched an episode about artificial intelligence which I got excited about right away. After watching the show, a use of ai I am interested in doing is using ai to combat diseases and illnesses. In the documentary, they had a model that could detect early signs of cancer. I also like chat bots as well. So I would like something under those lines like my own chatbot or using ai to do something good for the world.
Creativity is the use of imagination to generate something unique. In my opinion, content produced through generative AI is creative because it defines my definition of creativity. Generative AI can create unique images that humans would have never thought of. David Deutsch does not have a definition of creativity because he believes that creativity is fundamentally impossible to define. David Deutsch claims that once something is defined, you can set up a formal system that if something does not follow the system (definition), it does not work because it is considered outside the system.
Q: Is it necessary for AIs to be creative in order to be considered intelligent, or vice versa? A: My opinion is no. Ais are creative such as the use of generative AI because they follow human prompts and algorithms. Not because they are necessary "intelligent". Even Ais make mistakes humans would not make which was shared in the podcast where ChatGPT was told to fix something, and it did not fix it at all.
Encoding text from words to token IDs (integers)
Decoding from token IDs (integers) to words.
The <|unk|> token is added to represent new and unknown words that were not part of the training data meaning not part of the vocabulary. The <|endoftext|> token is used to separate two unrelated text sources. Their token ids are 783 and 784.
The special token that separates the original documents in the final combined dataset is the <|endoftext|> token
It lets the GPT learn connections between the long word and shorter words based on common parts, like "pneumonia", "microscope", or "volcano". - I think this is an advantage.
This approach can work in any language, not just English- I am not sure weather this is a disadvantage or advantage
All words are broken down in the same way, so the process is deterministic and results from repeated runs will be similar.-disadvantage, I think longer words are better cause then it can make new words and can be broken down more.
The system will handle any word it encounters in chat or inference, even if it has never seen the word during training.-Advantage, <|endoftext|> is used for dealing with unknown words and does it's job.
Smaller integer token IDs tend to correspond to more common words or characters. This is because these tokens are encountered more frequently in the training data.
Larger integer token IDs tend to correspond to less common words or characters. These tokens are encountered less frequently in the training data,
How does this case relate to selecting training datasets for GPTs, such as our final project in this class? A: When selecting training datasets for GPTs, you must be aware of copyright issues. Using copyrighted material without proper permissions can lead to legal consequences. So when choosing text to train your LLM, make sure the material is public to people.
Describe the main point of view of Hachette and other publishers on the plaintiff side of this case (the party who claim they have been wronged). A: The main point of view of Hachette and the other publishers on the plaintiff side of the Hachette v. Internet Archive case is that the Internet Archive's Controlled Digital Lending (CDL) program violates their copyrights and has caused significant financial harm to their businesses.
Describe the main point of view of Internet Archive, the founder Brewster Kahle, the Electronic Frontier Foundation, or other parties on the defendant side of this case (the party defending against claims that they have done anything wrong). A: The main point of view of the Internet Archive, its founder Brewster Kahle, the Electronic Frontier Foundation (EFF), and other parties on the defendant side of the Hachette v. Internet Archive case is centered on the belief that the Controlled Digital Lending (CDL) program is a lawful extension of traditional library lending, serving the public good by ensuring continued access to books.
What other news event or legal case is similar to this case that is interesting to you A: I think it is similar to the case about the StarWars character being brought back to life using CGI. Whether it is right or wrong to bring back the deceased.
Which of the above arguments are convincing to you? If you were the judge in this case, how would you form your opinion? A: I think the defendants have a strong answer depending their opinion. Their system is just like a regular library from what I understand, but digital and they have said there is no Financial harm done. As of right now, I would side with them.