NDeanon_AI_Homework_07 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Question 0: Which of the following vectors is a one-hot encoding? How would you describe a one-hot encoding in English?

A: The bit that is hot, so on the right side of the top row.

Question 1. What is an (x,y) training example (in English)?

A: the X is the input and the Y is the output.

Question 2. We call large texts for training a GPT "self-labeling" because we can sample from the text in a sliding window (or batches of words).

Match the following terms (A,B,C) with its definition below (1,2,3):

A. max_length in = ii

B. stride = i

C. batch size =iii

i. the number of token IDs to "slide" forward from one (x,y) training example to the next (x,y) training example

ii. chunk size, or number of token IDs to group together into one x or y of a training example (x,y)

iii. the number of (x,y) training examples returned in each call to next of our dataloader's iterator.

Question 3. Because embedding is stored as a matrix, and we studied how neural network weights can also be stored in a matrix, we can view the operation of transforming an input vector into an embedding as a two-layer neural network.

The embedding matrix should map each word in the vocabulary to a vector in the out put space, which means the matrix will be 7 rows and 128 columns so the shape should be (7,128) I believe.