~/ How LLM processes text in backend

Everyone uses ChatGPT today to generate text, understand how it works and how it processes text.

April 2, 2026

|

12 min read

AI
LLM
ChatGPT
Generative AI
Natural Language Processing
Text Processing
Machine Learning
Deep Learning

Before a Large Language Model (LLM) can understand or generate text, it first needs to break it down into smaller units called tokens.

A token is not always a word. It can be:

  • A full word (apple)
  • A subword (app + le)
  • A character (a, p, p, l, e)
  • Even punctuation (. , !)

Plain Text

Input: "ChatGPT is amazing!" Tokens: ["Chat", "G", "PT", " is", " amazing", "!"]

Different models tokenize differently depending on their tokenizer.


Tokenization is the process of converting raw text into tokens. Most modern LLMs use subword tokenization techniques like:

  • Byte Pair Encoding (BPE)
  • WordPiece
  • SentencePiece

Because:

  • It reduces vocabulary size
  • Handles unknown words better
  • Works across multiple languages

Plain Text

Word: "unbelievable" Tokenized: ["un", "believ", "able"]

  1. Text is normalized (lowercase, remove extra spaces)
  2. Tokenizer matches known patterns
  3. Splits into smallest meaningful units

Token training refers to how models learn relationships between tokens during training.

The model is trained to predict the next token given previous tokens.

Plain Text

Input: "The sky is" Target: "blue"

During training:

  • Input tokens → model
  • Model predicts probability distribution
  • Loss is calculated (difference from actual token)
  • Weights are updated using backpropagation

This process is repeated over billions of tokens.


These are the two most popular deep learning frameworks used to build LLMs.

  • Developed by Facebook (Meta)
  • Dynamic computation graph
  • Easy debugging
  • Widely used in research

  • Developed by Google
  • Static + dynamic graphs
  • Production-ready tools
  • Strong deployment ecosystem

Python

import torch import torch.nn as nn linear = nn.Linear(10, 5) input = torch.randn(1, 10) output = linear(input) print(output)

Transformers are the core architecture behind LLMs.

Main components:

  1. Token Embedding
  2. Positional Encoding
  3. Multi-Head Attention
  4. Feed Forward Network
  5. Transformer Block (stacked layers)

Tokens are integers. Models cannot understand integers directly.

So we convert tokens into vectors.

Plain Text

Token ID: 101 → [0.21, -0.33, 0.89, ..., 0.12]

This vector:

  • Captures meaning
  • Places similar words closer in vector space

Transformers do not understand order naturally.

So we add positional information.

Plain Text

"dog bites man" ≠ "man bites dog"

Add positional encoding to embeddings.

Plain Text

Final Input = Token Embedding + Positional Encoding

This helps model understand sequence order.


This is the most important part of transformers.

Each token looks at other tokens to understand context.

Plain Text

Sentence: "The bank of the river" "bank" attends to "river" → meaning = river bank

  1. Create Query (Q), Key (K), Value (V)
  2. Compute attention scores:

Plain Text

Attention(Q, K, V) = softmax(QK^T / sqrt(d)) V
  1. Multiple heads → multiple perspectives

After attention, data passes through a neural network.

Plain Text

FFN(x) = max(0, xW1 + b1)W2 + b2
  • Applies non-linearity
  • Helps model learn complex patterns

A transformer block combines:

  1. Multi-head attention
  2. Add & Norm
  3. Feed Forward
  4. Add & Norm

Plain Text

Input Multi-Head Attention Add & Normalize Feed Forward Add & Normalize Output

LLMs stack dozens or hundreds of these blocks.


Now the most exciting part: text generation

1. Input Prompt

Plain Text

"Once upon a time"

2. Tokenization

Plain Text

["Once", " upon", " a", " time"]

3. Forward Pass

  • Tokens → embeddings
  • Pass through transformer layers
  • Output = probability distribution over vocabulary

4. Next Token Prediction

Plain Text

Possible outputs: "there" → 40% "was" → 35% "a" → 10%

5. Sampling Strategy

  • Greedy (pick highest)
  • Top-k
  • Top-p (nucleus sampling)
  • Temperature scaling

6. Append Token

Plain Text

"Once upon a time there"

7. Repeat

This loop continues until:

  • End token is reached
  • Max length is hit

Python

tokens = tokenizer("Once upon a time") for _ in range(max_length): logits = model(tokens) next_token = sample(logits) tokens.append(next_token) text = tokenizer.decode(tokens)

LLMs do not "think" like humans.

They:

  • Predict next tokens
  • Based on patterns
  • Learned from massive data

Yet, this simple mechanism leads to:

  • Conversations
  • Code generation
  • Creativity

Understanding LLM internals reveals:

  • It's all math + probability
  • Transformers enable context understanding
  • Token prediction powers everything

From a simple next-token prediction system emerges something that feels intelligent.

And that is the beauty of modern AI.