L13A

Foundations Checkpoint

5 min

Checkpoint

You have completed the foundations. Here is what you now know.

What You Can Do Now

You have covered three modules and built a complete vocabulary for the LLM data path:

M1 Tokenize text. You know that raw text becomes discrete token IDs via a fixed vocabulary, that the context budget is counted in tokens, and that token boundaries are not always word boundaries.
M2 Follow the linear algebra. You can trace how token IDs become embeddings, how projections change representation size, how dot products compare vectors, how matrix multiplication batches those comparisons, and how to narrate tensor shapes in plain English.
M3 Understand the output. You know that the model produces one logit per vocabulary entry, that softmax turns logits into a probability distribution, and that the next token is chosen by argmax or sampling from that distribution.

The Running Example So Far

The full data path for "The cat sat" — from text to next-token prediction:

"The cat sat"

↓ tokenize

[791, 2368, 3290] — 3 discrete token IDs

↓ embedding lookup

[3, d_model] — 3 token vectors

↓ projections, attention, FFN (the parts you will learn next)

[3, d_model] — transformed hidden states

↓ output projection

[|V|] — one logit per vocabulary entry (for the last token position)

↓ softmax

probabilities over vocabulary

↓ argmax or sample

" on" — the predicted next token

What Comes Next

The box labeled "projections, attention, FFN" above is the transformer block — the engine that actually transforms token representations. This is what Module 4 teaches.

You will learn how tokens exchange information (attention), how each token is independently refined (FFN), how residual connections and normalization keep the signal stable, and how all of these compose into a single decoder block that repeats across layers.

The math gets denser. The shapes get more complex. But every operation is built from the tools you already have: projections, dot products, matrix multiplies, softmax. Nothing in M4 requires ideas you have not yet seen — only new combinations of familiar operations.

Want to see it in action? The Live Inference demo runs a real model (DistilGPT-2) entirely in your browser. You can watch every stage of the pipeline you just studied — tokenization, logits, softmax, sampling — on real model weights.