M7/Performance
Q-M7

Quiz: Performance Reasoning

15 min

Performance Reasoning

This quiz covers everything from M7: operator time breakdown, GEMM vs GEMV, compute-bound vs memory-bound analysis, quantization, memory layout and repacking, thread count and affinity, and validation of speedups.

You need 80% or better to proceed.

Check Yourself
reasoningQ1

An operation loads a 4096×4096 FP16 weight matrix (32 MB) and multiplies it by a single vector. It performs ~33M FLOPs. The hardware has a machine balance of 100 FLOP/byte. Is this operation compute-bound or memory-bound?

reasoningQ2

After quantizing a model from FP16 to Q8_0, you measure that decode speed improved by 40% but perplexity increased from 5.8 to 5.9. Is this a valid optimization?

reasoningQ3

Decode processes one token per step, making weight projections into GEMV. An engineer adds a GEMM tiling optimization that improves large-matrix throughput by 30%. Which phase benefits more?

reasoningQ4

A model is quantized from FP16 to Q4_0. Decode speed improves by 50%, but perplexity on wikitext-2 increases from 5.8 to 7.2. What should you conclude?

shapeQ5

A weight matrix has logical shape [4096, 4096]. After repacking for a SIMD kernel, what changes?