A guided systems curriculum

LLM From
the Ground Up

Go from no exposure to LLM internals to reading model code and reasoning about inference performance. A strict, linear curriculum for engineers who want real understanding, not hand-waving.

Start the Course Live Demo Reference

What you will be able to do

By the end, you will have practical, systems-level understanding of how large language models work at inference time.

Speak the language

Define tokens, embeddings, hidden states, logits, softmax, attention, FFN, MoE, KV cache, prefill, and decode — without bluffing.

Sketch the architecture

Draw a decoder transformer block from memory. Explain what Q, K, V do. Distinguish dense, GQA, MoE.

∑

Follow the math

Read model code without panic. Track tensor shapes through projections, attention, and FFN.

⚡

Reason about inference

Explain why prefill and decode are different workloads. Explain the KV cache. Explain batching policy.

▤

Read real code

Walk through llama.cpp model builders and connect theory to implementation artifact by artifact.

⊘

Diagnose performance

Look at a profile and produce a plausible first hypothesis. Know when a speedup needs validation.

The guided path

9 modules. 56 guided steps: 51 lessons, 4 quizzes, 1 capstone.

Orientation

The map of the journey

3 lessons

Discrete Input

Tokens, vocabulary, context

4 lessons

Linear Algebra

Vectors, matrices, projections

7 lessons

Probability

Logits, softmax, sampling

4 lessons

Transformer Block

The dense decoder block in full

9 lessons

Architecture Variants

GQA, SWA, shared-KV, MoE

6 lessons

Inference Mechanics

Prefill, decode, KV cache, batching

6 lessons

Performance

Bottlenecks, quantization, validation

7 lessons

Case Studies

Real code walkthroughs

5 lessons

This course is not

✗A marketing site about AI

✗A research survey

✗A glossary dump

✗A random blog collection

✗A training researcher curriculum

✗A textbook PDF on the web

Ready?

Start the Course

LLM From the Ground Up