GPU memory for LLMs calculated by simple formula: VRAM equals parameters times bits divided by 8; FP16 uses 2GB per billion parameters, FP8 uses 1GB, 4-bit uses 0.5GB—but add 10-30% extra for cache and overhead.

GPU Memory Math for LLMs (2026 Edition)

If you’re running models locally, thinking “model → VRAM” falls apart once you account for how the weights were trained and quantized in the first place.There’s a better way to think about it:VRAM (in GB) ≈ Parameters (in billions) x (effective bits per weight ÷ 8)That’s it.This one formula explains everything across:FP16 / BF16FP8 / INT8GPTQ / AWQ / NF4GGUF variantsbasically every format you’ll useHere’s the core intuition:FP16 / BF16 → 16 bits → ~2 GB per 1B paramsFP8 / INT8 → 8 bits → ~1 GB p...