Binary search is probably one of the simplest and most easily thought up solution to finding a specific numerically representable value in a vector — in fact, I’d bet it’s the first non-naive algorithm most developers come up with and implement when they first start developing, very likely without knowing even the name of it. While the algorithm itself is suboptimal for some workloads, it’s all round *not that bad* and it should also be a decent fit for modern CPU caches when working with relatively static data sets.

However, due to the nature of how different cache tiers map cache lines even on modern CPUs, binary search in sets of sizes relatively close to a power of 2 turns out to be a rather pathological case for CPU caches. Paul Khoung has done a stellar job in spotting and analyzing solutions for this problem. Being a sucker for elegant solutions, the one that really did it for me was the — in hindsight — obvious choice of just doing a ternary rather than a binary search:

There’s little we can do about hardware, and changing our interface to avoid round element sizes or counts isn’t always possible. The only remaining knob to improve the performance and consistency of binary search is the division factor.

The successor of two is three.