AI_LabReport_Week5 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

##Building a Large-Language Model From Scratch: Chapter 1 and 2

Lab Directions found here

Output 1: Counting total words and printing the first hundred words

Output 2: Splitting on whitespace and punctuation marks

Output 3: Assigning vocabulary words to token IDs

Output 4: Encoding a sentence from the dataset

Output 5: Round-trip decoding back to a sentence from token IDs

Output 6: Encoding a new test sentence not from the dataset

Output 7: Adding meta-tokens to end of vocabulary/Output 8: Encoding a sentence using both known and unknown tokens

⚠️ GitHub.com Fallback ⚠️