AI_Homework7_Response - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

here

Question 0. Which of the following vectors is a one-hot encoding? How would you describe a one-hot encoding in English?

The vector displayed on the right is the one-hot encoding of the vector to the left.

A one hot encoding is a binary representation of a positional embedding, which uses matrix multiplication to produce a vector containing the data from the row indicated by that hot 1.

Question 1. What is an (x, y) training example (in English)?

An (x, y) training example is an example in the form (input, target). This can also be explained as the "training data", and corresponding "label".

Question 2. We call large texts for training a GPT "self-labeling" because we can sample from the text sliding in a sliding window (or batches of words). Match the following terms (A, B, C) with its definition below (1, 2, 3)
A. max_length --> ii. chunk size, or number of token IDs to group together into one x or y of a training example (x,y)

B. stride --> i. the number of token IDs to "slide" forward from one (x,y) training example to the next (x,y) training example

C. batch size --> iii. the number of (x,y) training examples returned in each call to next of our dataloader's iterator.

Question 3. Because embedding is stored as a matrix, and we studied how neural network weights can also be stored in a matrix, we can view the operation as transforming an input vector into an embedding as a two-layer network.
Given a matrix with an input of shape ([7]) and an output of size ([128]), the resulting embedding would be a vector with 7 rows, each row spanning 128 columns. A matrix's shape in written in the form (rows, cols), so the shape would appear as (7, 128).
⚠️ **GitHub.com Fallback** ⚠️