Hashed Radix Sort Outperforms Hash Tables: 1.5x Faster for Large Datasets, Study Shows

Hashed sorting is typically faster than hash tables

Problem statement: count the unique values in a large array of mostly-unique uint64s. Two standard approaches are: Insert into a hash table and return the number of entries. Sort the array, then count positions that differ from their predecessor. Hash tables win the interview (O(n)O(n) vs O(nlog⁡n)O(n \log n)), but sorting is typically faster in a well-tuned implementation. This problem and its variants are the inner loop of some of the world’s biggest CPU workloads. Benchmark highlights Here is...