token - AshokBhat/ml GitHub Wiki

About

  • In the context of LLM, a token refers to a unit of text that the model processes.

Rule of thumb

  • 1 token = ~4 characters of text for common English text.
  • Translates to roughly 0.75 words.
  • 100 tokens ~= 75 words

Specific examples

  • Collected works of Shakespeare are about 900,000 words or 1.2M tokens

See also

⚠️ **GitHub.com Fallback** ⚠️