AI_Homework5_ResponseUnfinished - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Chapter 1 Questions

1. What is the difference between a GPT and an LLM? Are the terms synonymous?

LLMs don't necessarily use transformers. GPT's are models built on the transformer architecture, and they're built for text generation (like suggestive text), whereas LLMs have a wider range of capacity.

2a. Labeled training pairs of questions and answers, in the model of "InstructGPT" are most similar to which of the following?
  • A. Posts from Stackoverflow which have responses to them.
  • B. Posts on Twitter/X and their replies
  • C. Posts on Reddit and their replies
  • D. Posts on Quora and their replies
  • E. Images of handwritten digits and a label of 0 through 9 in the MNIST classification task
2b. For each one, are there multiple labels for a training datapoint, and if so, is there a way to rank the quality or verity (closeness to the truth) of the labels with respect to their training datapoint?
  • A. Posts from Stackoverflow which have responses to them.
  • B. Posts on Twitter/X and their replies
  • C. Posts on Reddit and their replies
  • D. Posts on Quora and their replies
  • E. Images of handwritten digits and a label of 0 through 9 in the MNIST classification task
3. The GPT architecture in the paper "Attention is All You Need" was originally designed for which task?
  • A. Passing the Turing test
  • B. Making up song lyrics
  • C. Machine translation from one human language to another
  • D. Writing a novel
4. How many layers of neural networks is considered "deep learning" in the Rashka text?
5. Is our MNIST classifier a deep learning neural network by this definition?
6. For each statement about how pre-training is related to fine-tuning for GPTs:
  • If the statement is true, write "True" and give an example.
  • If the statement is false, write "False" and give a counter-example.
Title T/F
A. Pre-training is usually much more expensive and time-consuming than fine-tuning.
B. Pre-training is usually done with meticulously labeled data while finetuning is usually done on large amounts of unlabeled or self-labeling data.
C. A model can be fine-tuned by different people than the ones who originally pre-trained a model.
D. Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks.
E. Fine-tuning usually uses less data than pre-training.
F. Pre-training can produce a model from scratch, but fine-tuning can only change an existing model.
7. GPTs work by predicting the next word in a sequence, given which of the following as inputs or context?
  • A. The existing words in sentences it has already produced in the past.
  • B. Prompts from the user
  • C. A system prompt that frames the conversation or instructs the GPT to behave in a certain role or manner
  • D. New labeled pairs that represent up-to-date information that was not present at the time of training
  • E. The trained model while includes the encoder, decoder, and the attention mechanism weights and biases
8. The reading distinguishes between these three kinds of tasks that you might ask an AI to do:
  • Predicting the next word in a sequence (for a natural language conversation)
  • classifying items, such as a piece of mail as spam, or a passage of text as an example of Romantic vs. realist literature
  • Answering questions on a subject after being trained with question-answer examples
Open your favorite AI chat (these are probably all GPTs currently) such as OpenAI ChatGPT, Google's Gemini, Anthropic's Claude, etc. Have a conversation where you try to understand how these three tasks are the same or different. In particular, is one of these tasks general-purpose enough to implement the other two tasks? Copy and paste a link to your chat into your dev diary entry.
9. Which of the following components of the GPT architecture might be neural networks, similar to the MNIST classifier we have been studying? Explain your answer. A. Encoder, that translates words into a higher-dimensional vector space of features B. Tokenizer, that breaks up the incoming text into different textual parts C. Decoder, the translates from a higher-dimensional vector space of features back to words
10. What is an example of zero-shot learning that we have encountered in this class already? Choose all that apply and explain. A. Using an MNIST classifier trained on numeric digits to classify alphabetic letters instead. B. Using the YourTTS model for text-to-speech to clone a voice the model has never heard before C. Using ChatGPT or a similar AI chat to answer a question it has never seen before with no examples D. Using spam filters in Outlook by marking a message as spam to improve Microsoft's model of your email reading habits
11. What is zero-shot learning, and how does it differ from few-shot or many-shot learning?
12. What is the number of model parameters quoted for GPT-3, a predecessor of the model used to power the first ChatGPT product?

Chapter 2 Questions (Part 1)

1. Why can't LLMs operate on words directly? (Hint: think of how the MNIST neural network works, with nodes that have weighted inputs, that are converted with a sigmoid, and fire to the next layer. These are represented as matrices of numbers, which we practiced multiplying with Numpy arrays. The text-to-speech system TTS similarly does not operate directly on sound data.)
2. What is an embedding? What does the dimension of an embedding mean?
3. What is the dimension of the embedding used in our example of Edith Wharton's short story "The Verdict"? What is it for GPT-3?
4. Put the following steps in order for processing and preparing our dataset for LLM training A. Adding position embeddings to the token word embeddings B. Giving unique token IDs (numbers) to each token. C. Breaking up natural human text into tokens, which could include punctuation, whitespace, and special "meta" tokens like "end-of-text" and "unknown" D. Converting token IDs to their embeddings, for example, using Word2Vec

Human Writing

Read/listen Part I of an interview of David Deutsch by Naval Ravikant and Brett Hall Write a response to this piece of 500 words or more. Use any of the questions below to guide your thinking or react against them. • Form and describe a potential philosophy of when you will source a dataset for AI training. When and in what contexts do you hope to gather data for training an AI model? ◦ Are there any principles or past experiences that have led you to choose this philosophy? ◦ What is David Deutsch's apparent philosophy of AI use? How have his thoughts affected your policy if at all? • What is a use of AI that you are interested in, either that you've seen publicly or that you've considered as a thought experiment? • What is creativity? ◦ What is David Deutsch's apparent definition of creativity? Or, what does he think creativity is not? (anti-definitions) ◦ Is it necessary for AIs to be creative in order to be considered intelligent, or vice versa? ◦ Is creativity important? Why or why not? ▪ Expand your answer above to include both your own personal preferences and circumstances, as well as within human society as a whole. ◦ Are there any negative effects of creativity or creative freedom? Why or why not?

⚠️ **GitHub.com Fallback** ⚠️