A guided systems curriculum

LLM From
the Ground Up

Go from zero understanding of LLMs to reading model code and reasoning about inference performance. A strict, linear curriculum for engineers who want practical understanding, not hand-waving.

What you will be able to do

By the end, you will have practical, systems-level understanding of how large language models work at inference time.

T

Speak the language

Define tokens, embeddings, hidden states, logits, softmax, attention, FFN, MoE, KV cache, prefill, and decode — without bluffing.

A

Sketch the architecture

Draw a decoder transformer block from memory. Explain what Q, K, V do. Distinguish dense, GQA, MoE.

Follow the math

Read model code without panic. Track tensor shapes through projections, attention, and FFN.

Reason about inference

Explain why prefill and decode are different workloads. Explain the KV cache. Explain batching policy.

Read real code

Walk through llama.cpp model builders and connect theory to implementation artifact by artifact.

Diagnose performance

Look at a profile and produce a plausible first hypothesis. Know when a speedup needs validation.

The guided path

9 modules. 48 lessons. One strict, linear sequence.

M0
Orientation
The map of the journey
3 lessons
M1
Discrete Input
Tokens, vocabulary, context
3 lessons
M2
Linear Algebra
Vectors, matrices, projections
6 lessons
M3
Probability
Logits, softmax, sampling
3 lessons
M4
Transformer Block
The dense decoder block in full
9 lessons
M5
Architecture Variants
GQA, SWA, shared-KV, MoE
6 lessons
M6
Inference Mechanics
Prefill, decode, KV cache, batching
6 lessons
M7
Performance
Bottlenecks, quantization, validation
7 lessons
M8
Case Studies
Real code walkthroughs
5 lessons

This course is not

A marketing site about AI
A research survey
A glossary dump
A random blog collection
A training researcher curriculum
A textbook PDF on the web

Ready?

Start the Course