Artificial Analysis finds NVIDIA H100 and B200 deliver 5x better cost-per-token than Google TPU v6e, 2x better than AMD MI300X in AI inference benchmarks using Llama 3.3 70B model

Artificial Analysis on X: "Google TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysis’ Hardware Benchmarking shows NVIDIA achieving a ~5x tokens-per-dollar advantage over TPU v6e (Trillium), and a ~2x advantage over MI300X, in our key inference cost metric In our metric for inference cost https://t.co/EsWnSggjz8" / X

PostConversationGoogle TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysis’ Hardware Benchmarking shows NVIDIA achieving a ~5x tokens-per-dollar advantage over TPU v6e (Trillium), and a ~2x advantage over MI300X, in our key inference cost metric In our metric for inference cost called Cost Per Million Input and Output Tokens at Reference Speed, we see NVIDIA H100 and B200 systems achieving lower overall cost than TPU v6e and MI300X. For Llama 3.3 70B running with vLLM at a Per-Query R...