wshine ai hw5 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Homework 5

Chapter 1 Questions

What is the difference between a GPT and an LLM? Are the terms synonymous?

LLM is more general and includes various types of neural networks and goals. GPT is a specific subset of LLM that focuses on text generation/completion.
Labeled training pairs of questions and answers, in the model of "InstructGPT" are most similar to which of the following?

A. Posts from Stackoverflow which have responses to them.

B. Posts on Twitter/X and their replies

C. Posts on Reddit and their replies

D. Posts on Quora and their replies
The GPT architecture in the paper "Attention is All You Need" was originally designed for which task

C. Machine translation from one human language to another
How many layers of neural networks is considered "deep learning" in the Rashka text?

3 or more.
Is our MNIST classifier a deep learning neural network by this definition?

no we only had 1 hidden layer
For each statement about how pre-training is related to fine-tuning for GPTs:
- If the statement is true, write "True" and give an example.
- If the statement is false, write "False" and give a counter-example.
A. Pre-training is usually much more expensive and time-consuming than fine-tuning.

yes I believe you need alot more data for pretraining. But I also think the data used in pretraining does not need to be labeled which means it's alot easier to gather/preprocess that data.

B. Pre-training is usually done with meticulously labeled data while finetuning is usually done on large amounts of unlabeled or self-labeling data.

false.

C. A model can be fine-tuned by different people than the ones who originally pre-trained a model.

True

D. Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks.

true. pretraining a model on web article content. fine tuning it to be able to not only answer questions but generate completely new articles.

E. Fine-tuning usually uses less data than pre-training.

true.

F. Pre-training can produce a model from scratch, but fine-tuning can only change an existing model.

true.
GPTs work by predicting the next word in a sequence, given which of the following as inputs or context?

B. Prompts from the user

C. A system prompt that frames the conversation or instructs the GPT to behave in a certain role or manner

E. The trained model which includes the encoder, decoder, and the attention mechanism weights and biases
The reading distinguishes between these three kinds of tasks that you might ask an AI to do:
- Predicting the next word in a sequence (for a natural language conversation)
- classifying items, such as a piece of mail as spam, or a passage of text as an example of Romantic vs. realist literature
- Answering questions on a subject after being trained with question-answer examples
Open your favorite AI chat (these are probably all GPTs currently) such as OpenAI ChatGPT, Google's Gemini, Anthropic's Claude, etc.

gpt chat session

I think predicting the next word in a sequence is very important here but not enough to be able to accomplish classifying items.
Which of the following components of the GPT architecture might be neural networks, similar to the MNIST classifier we have been studying? Explain your answer.

A. Encoder, that translates words into a higher-dimensional vector space of features

C. Decoder, the translates from a higher-dimensional vector space of features back to words

I guess either of these could be. So far I understand encoding/decoding as creating a dictionary of words to numbers that will then be used in a nueral network for training, But i do not quite get what it means to translate words into a higher-dimensional vector space of features. would that actually involve a neural network? or is it just encoding data in order to train a neural network.
What is an example of zero-shot learning that we have encountered in this class already? Choose all that apply and explain.

B. Using the YourTTS model for text-to-speech to clone a voice the model has never heard before

C. Using ChatGPT or a similar AI chat to answer a question it has never seen before with no examples

both of these are generating something from data it was not trained on and has no examples to go off of, this is what zero-shot means.

What is zero-shot learning, and how does it differ from few-shot or many-shot learning?

zero-shot learning is when a model can generate output for some input that it has not seen before and has no examples to go off of. few-shot learning would require a few samples from the user before generating an ouptput.
What is the number of model parameters quoted for any of the OpenAI models: GPT-3, GPT-3.5, or GPT-4?

gpt-3 has 175 billion parameters

Chapter 2 Questions (Part 1)

Why can't LLMs operate on words directly? (Hint: think of how the MNIST neural network works, with nodes that have weighted inputs, that are converted with a sigmoid, and fire to the next layer. These are represented as matrices of numbers, which we practiced multiplying with Numpy arrays. The text-to-speech system TTS similarly does not operate directly on sound data.)

A nueral network requires numeric input. It has to run the input through mathematical equations.

What is an embedding? What does the dimension of an embedding mean?

an embedding is data that has transformed into a numeric vector format. an embedding maps a discrete object (word in our case) to some plane that we can visualize and see relationships between embeddings. or atleast if we pick a small dimension to map them to.

I'm not actually sure what each individual dimension would represent in this case. But I understand the more dimensions in an embedding, the larger the number of similarities we can express between different embeddings. This comes at the cost of more computation.
What is the dimension of the embedding used in our example of Edith Wharton's short story "The Verdict"? What is it for GPT-3?

256 for the example in the book. 12,288 for GPT-3
Put the following steps in order for processing and preparing our dataset for LLM training

A. Adding position embeddings to the token word embeddings B. Giving unique token IDs (numbers) to each token. C. Breaking up natural human text into tokens, which could include punctuation, whitespace, and special "meta" tokens like "end-of-text" and "unknown" D. Converting token IDs to their embeddings, for example, using Word2Vec

C, B, D, A

Human Writing

I dont believe generative AI content creation is creative, It's systematic and calculated. When I think of creativity I think of something more spontaneous or imaginative. I'm not sure AI can do that. I found it interesting that David Duetsch's definition of Creativity was that Creativity is impossible to define. I get the point he is making, and I think I agree with it given the context of AI, but I honestly didn't follow most of his example. Creativity is important for two main reasons, one being that an AI being able to truely display Creativity would probably be a huge milestone achievement and would lead to unkown use cases of AI (atleast unkown to me). The other being that creativity is something a lot of people take pride in and if AI could truely do the same I believe that could negatively impact us, almost like it would devalue human creativity. It was mentioned in the reading that AI could free people up to be creative. I agree with this, AI can take repetative, unintersting or uninspiring tasks allowing a person to spend more time finding creative solutions to more intereseting problems.

The two uses of AI I am currently interested in is decision-making for computer controlled characters in games, and the creation or enhancement of developer tools using AI. I don't know a whole bunch about how AI is currently used in developer tools but I find things like linters, language servers, and complilers very interesting and want to explore more how and if AI are currently used by these tools. As well as how could AI be used to improve a developers workflow in a positive way? Whether they are experienced or new to the field, and do so in a way that does not take away from their experience or learning.