GPU Memory Math for LLMs (2026 Edition)
If you’re running models locally, thinking “model → VRAM” falls apart once you account for how the weights were trained and quantized in the first place.There’s a better way to think about it:VRAM (in GB) ≈ Parameters (in billions) x (effective bits per weight ÷ 8)That’s it.This one formula explains everything across:FP16 / BF16FP8 / INT8GPTQ / AWQ / NF4GGUF variantsbasically every format you’ll useHere’s the core intuition:FP16 / BF16 → 16 bits → ~2 GB per 1B paramsFP8 / INT8 → 8 bits → ~1 GB p...
Read more at theahmadosman.substack.com