Going Deeper with AI: A Brief on Tokenisation

Tokenisation is the concept of cutting text into multiple texts known as tokens. LLMs and us humans interpret writings differently.

Given the following phrase: What’s the story? Morning glory.

An LLM could view it as:

Each of this represents a single token. Each token is given a unique ID.

A different LLM could view it as:

It seems to me this helps the LLM improve its pattern recognition.

Here’s an article by Sean Trott that goes in-depth on tokenisation: https://seantrott.substack.com/p/tokenization-in-large-language-models.

LLM providers are using tokens to decide cost of input and output.

For my case right now as a newbie AI orchestrator, knowing what tokens are should suffice.