What is Tokenization?

LLMs don’t work with words the way we do. They break language into tokens—pieces that can be smaller or larger than words—and use those to learn and generate text. This post walks through what tokenization is, why it matters, and how it shapes everything from model behavior to prompt limits.