Griffin AI 5 8 2024 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
Link to code:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb
--splitting data into smaller parts. I.e. dividing a sentence into the separate words.
Output 1: Count Characters
Output 2: Split by Whitespace
Output 3: Converting Tokens into token IDs
Output 4: Encode a sentence from the dataset. Tokens -> token ID
-prints the Token IDs
Output 5: Decode the encoded text
Output 6:
I get a KeyError because the word 'delete' is not a key in the vocabulary.
Output 7: add meta-tokens
-<|endoftext|> is a token that marks the start of a new text
-<|unk|> marks unknown words
Output 8: encoding a sentence with known and unknown tokens