AI‐24sp‐2024‐05‐29‐Morning - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Self-Hosting, Spring 2024

Week 09 - Thursday

Big Picture and Review

In Weeks 6 and 7, we

tokenized our source data into word parts
then encoded these letter fragments into integers (token IDs)
divided up our sequence of token IDs into chunks of context_length
paired a training input x with a predicted next word with context y by "sliding the window" further in the text
- this is a self-labeling pair
grouped them into batches of (x,y)
- to make matrix multiplication more efficient
encoded them into a higher-dimensional embedding vector space
added positional encoding

So for each batch at a time, we are left with a tensor of shape

$$ ( batch_size, context_length, embedding_dimension ) $$

We learned the model of feeding an existing conversation back into the model to predict the next word, both at training time and inference time.

In Week 8, we learned about embeddings.

We also learned how difference words can influence each other ("pay attention to" each other) to shift themselves in the embedding space.

To calculate attention, we used a query matrix for the "influencee" and a "key" matrix for potential "influencers" to generate vectors for each word that meet in a "query-key" space.

where we can take the overlap to know whether a potential influencer