AI‐24sp‐2024‐05‐29‐Morning - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
AI Self-Hosting, Spring 2024
Week 09 - Thursday
Big Picture and Review
In Weeks 6 and 7, we
- tokenized our source data into word parts
- then encoded these letter fragments into integers (token IDs)
- divided up our sequence of token IDs into chunks of
context_length
- paired a training input
x
with a predicted next word with contexty
by "sliding the window" further in the text- this is a self-labeling pair
- grouped them into batches of (x,y)
- to make matrix multiplication more efficient
- encoded them into a higher-dimensional embedding vector space
- added positional encoding
So for each batch at a time, we are left with a tensor of shape
$$ ( batch_size, context_length, embedding_dimension ) $$
We learned the model of feeding an existing conversation back into the model to predict the next word, both at training time and inference time.
In Week 8, we learned about embeddings.
We also learned how difference words can influence each other ("pay attention to" each other) to shift themselves in the embedding space.
To calculate attention, we used a query matrix for the "influencee" and a "key" matrix for potential "influencers" to generate vectors for each word that meet in a "query-key" space.
where we can take the overlap to know whether a potential influencer