Build a Transformer from Scratch

From raw text to generated output: build a working transformer language model step by step in pure Python.

Tutorial progress tracker — requires JavaScript to display

Step 1: Setup & Data

Python

Interactive code editor — requires JavaScript

Python

Interactive code editor — requires JavaScript

Step 2: Tokenizer

Build a simple word-level tokenizer with encode/decode methods.

Python

Interactive code editor — requires JavaScript

Step 3: Embeddings

Create an embedding layer that converts token IDs to dense vectors.

Python

Interactive code editor — requires JavaScript

Step 4: Single-Head Attention

Implement the core QKV attention mechanism.

Python

Interactive code editor — requires JavaScript

Step 5: Multi-Head Attention

Run multiple attention heads in parallel and concatenate.

Python

Interactive code editor — requires JavaScript

Step 6: Transformer Block

Combine attention, feedforward, residual connections, and layer normalization.

Python

Interactive code editor — requires JavaScript

Step 7: Text Generation

Build autoregressive generation - predict one token at a time.

Python

Interactive code editor — requires JavaScript

Step 8: Temperature & Top-p Sampling

Add temperature and nucleus sampling for controlled generation.

Python

Interactive code editor — requires JavaScript

What You Built:

A tokenizer that converts text to integers and back
An embedding layer that gives tokens vector meaning
Single-head and multi-head attention mechanisms
A full transformer block with residual connections and normalization
Autoregressive text generation
Temperature and top-p sampling for controlled output

In production, these same components are scaled to billions of parameters and trained on trillions of tokens.

Concept•Beginner

How LLMs Think

Review the theory behind every component you just built.

Exercise•Intermediate

Practice Exercises

Test your understanding with 7 interactive challenges.

Concept•Beginner

Context Engineering

Now that you understand the model, learn to engineer what it sees.

Build a Transformer from Scratch

Step 1: Setup & Data

Step 2: Tokenizer

Step 3: Embeddings

Step 4: Single-Head Attention

Step 5: Multi-Head Attention

Step 6: Transformer Block

Step 7: Text Generation

Step 8: Temperature & Top-p Sampling

How LLMs Think

Practice Exercises

Context Engineering

More Tutorials