Processing Strings 109x Faster than Nvidia on H100
I’ve just shipped StringZilla v4, the first CUDA-capable release of my SIMD-first string processing library.
Which in English means that it is now fast not only on CPUs, but also on GPUs!I’ve wanted to add ROCm-acceleration for AMD GPUs 🤦♂️I’ve wanted to include a parallel multi-pattern search algorithm 🤦♂️I’ve wanted to publish it back in December 2024 🤦♂️So not everything went to plan, but “StringZilla 4 CUDA” is finally here, bringing 500+ GigaCUPS of edit-distance calculations in a pip...
Read more at ashvardanian.com