Hashed sorting is typically faster than hash tables
Problem statement: count the unique values in a large array of mostly-unique uint64s. Two standard approaches are:
Insert into a hash table and return the number of entries.
Sort the array, then count positions that differ from their predecessor.
Hash tables win the interview (O(n)O(n) vs O(nlogn)O(n \log n)), but sorting is typically faster in a well-tuned implementation. This problem and its variants are the inner loop of some of the world’s biggest CPU workloads.
Benchmark highlights
Here is...
Read more at reiner.org