Everyone uses ChatGPT today to generate text, understand how it works and how it processes text.
April 2, 2026
|
12 min read
Before a Large Language Model (LLM) can understand or generate text, it first needs to break it down into smaller units called tokens.
A token is not always a word. It can be:
apple)app + le)a, p, p, l, e). , !)Plain Text
Input: "ChatGPT is amazing!" Tokens: ["Chat", "G", "PT", " is", " amazing", "!"]
Different models tokenize differently depending on their tokenizer.
Tokenization is the process of converting raw text into tokens. Most modern LLMs use subword tokenization techniques like:
Because:
Plain Text
Word: "unbelievable" Tokenized: ["un", "believ", "able"]
Token training refers to how models learn relationships between tokens during training.
The model is trained to predict the next token given previous tokens.
Plain Text
Input: "The sky is" Target: "blue"
During training:
This process is repeated over billions of tokens.
These are the two most popular deep learning frameworks used to build LLMs.
Transformers are the core architecture behind LLMs.
Main components:
Tokens are integers. Models cannot understand integers directly.
So we convert tokens into vectors.
Plain Text
Token ID: 101 → [0.21, -0.33, 0.89, ..., 0.12]
This vector:
Transformers do not understand order naturally.
So we add positional information.
Plain Text
"dog bites man" ≠ "man bites dog"
Add positional encoding to embeddings.
Plain Text
Final Input = Token Embedding + Positional Encoding
This helps model understand sequence order.
This is the most important part of transformers.
Each token looks at other tokens to understand context.
Plain Text
Sentence: "The bank of the river" "bank" attends to "river" → meaning = river bank
Plain Text
Attention(Q, K, V) = softmax(QK^T / sqrt(d)) V
After attention, data passes through a neural network.
Plain Text
FFN(x) = max(0, xW1 + b1)W2 + b2
A transformer block combines:
Plain Text
Input ↓ Multi-Head Attention ↓ Add & Normalize ↓ Feed Forward ↓ Add & Normalize ↓ Output
LLMs stack dozens or hundreds of these blocks.
Now the most exciting part: text generation
Plain Text
"Once upon a time"
Plain Text
["Once", " upon", " a", " time"]
Plain Text
Possible outputs: "there" → 40% "was" → 35% "a" → 10%
Plain Text
"Once upon a time there"
This loop continues until:
LLMs do not "think" like humans.
They:
Yet, this simple mechanism leads to:
Understanding LLM internals reveals:
From a simple next-token prediction system emerges something that feels intelligent.
And that is the beauty of modern AI.