GitHub - deepreinforce-ai/CUDA-L2: CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
🥳 Introduction
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 systematically outperforms major matmul baselines to date, from the widely-used torch.matmul to state-of-the-art NVIDIA closed-source libraries (cuBLAS, cuBLASLt-heuristic, cuBLASLt-AutoTuning). Pap...
Read more at github.com