token - AshokBhat/ml GitHub Wiki
- In the context of LLM, a token refers to a unit of text that the model processes.
- 1 token = ~4 characters of text for common English text.
- Translates to roughly 0.75 words.
- 100 tokens ~= 75 words
- Collected works of Shakespeare are about 900,000 words or 1.2M tokens