Just divide by 3

Binary search is probably one of the simplest and most easily thought up solution to finding a specific numerically representable value in a vector — in fact, I’d bet it’s the first non-naive algorithm most developers come up with and implement when they first start developing, very likely without knowing even the name of it. While the algorithm itself is suboptimal for some workloads, it’s all round not that bad and it should also be a decent fit for modern CPU caches when working with relatively static data sets.

However, due to the nature of how different cache tiers map cache lines even on modern CPUs, binary search in sets of sizes relatively close to a power of 2 turns out to be a rather pathological case for CPU caches. Paul Khoung has done a stellar job in spotting and analyzing solutions for this problem. Being a sucker for elegant solutions, the one that really did it for me was the — in hindsight — obvious choice of just doing a ternary rather than a binary search:

There’s little we can do about hardware, and changing our interface to avoid round element sizes or counts isn’t always possible. The only remaining knob to improve the performance and consistency of binary search is the division factor.

The successor of two is three.

September 27, 2012 |

Standardizing stagnation

As of this Monday, I’m now officially the CTO of the Danish startup IconFinder, a task I’ve been looking forward to taking on me for almost half a year now. This is my first time running all the shots in the technical department of a company of this size and with these expectations, and naturally this brings with it both equal parts excitement and fear. Excitement about the opportunity to “do things right” after having spent two years on the other side of the table, and fear of failing at delivering on exactly that promise — to both the company but, more critically, also myself.

The first step in avoiding failure is to try and learn as much as you can from everyone else; what has and hasn’t worked for them, and what solutions have they wound up with. Five-ten years ago this would have been a very difficult task and the outcome of such research would have probably been a lot closer to general management thinking than what is beneficial for a startup or any company trying to innovate. Somewhere along the line, this started changing, though. A trend that seems to start with Google publicly talking about their experiments with the now infamous “20 % time,” which has cast off successes like Gmail, spurred an increasingly vocal discussion about the importance of being “a great place to work” and “working great.” Companies like 37signals have added more fuel to the fire with their provocative experiments and claims, and more recently GitHub has made popular what will probably be their longest lasting legacy; the “asynchronous workflow.”

These three companies are only the tip of the iceberg. In fact, there’s now such an abundance of descriptions of and recipes for The Best Place To Work™ and The Best Way To Work® that it seems almost too easy. Pick a successful company and mimic what they’re doing. Hell, if four day work weeks and letting people work on what they want works so great, why not make them standard business practise?

The truth is of course, that this will never work. I’m not David Heinemeier Hansson, and Zach Holman won’t be one of my first employees. I’m Nick Bruun and someone else will be at least as awesome an early employee. That doesn’t mean that these ideas won’t have merit for IconFinder, but simply that they will not be directly applicable and that an attempt at trying to fit the company into this “best practise frame” will most likely hamper the company rather than benefit it, no matter how tempting it is. In many ways, I find this to be analogous to the problem Nick Pollard describes with code standards:

Even worse, coding standards are stasis. By locking yourself in to a particular point in time, a particular way of thinking, you get stuck in one exact form. You get left behind. How often have we seen a big company get locked into some ‘best-practice’ only for a startup to run rings around them because they don’t have a ball and chain tied to their leg? Some new technology is invented but not embraced, because it doesn’t fit the one true way?

So, while mimicking and standardising is a tempting solution to a problem — management — that seems like it only takes focus away from doing real work, the truth is that we need to approach this area just like we approach product development. Nick Pollard hits the nail on the head with his code standard, and I think it really isn’t too far from what we should or at least what I will actively be striving for in every aspect of a company:

Simplicity.
Brevity.
Elegance.

The rest is up to you.

September 8, 2012 |

Indistinguishable password hashing

One of the biggest problems when engineering a password storage scheme using salting and hashing, as should be the minimum, is to separate user-specific salts from user-specific password hashes, in order to reduce the direct applicability of a rainbow table (don’t ever assume that your “algorithm” is safe from the hashes and salts themselves.) There are a lot of more or less silly suggestions out there as to how to do this, but recently Jeremy Spilman wrote a two part series on what seems like a reasonably good approach.

While his initial idea was inherently flawed, he corrected a lot of this in the follow-up post. The “algorithm” relies on the complexity of modern hash algorithms while storing salted hashes relationally disjoint from users but with verifiable joint traits and the ability to increase the entropy of the joint relationship through noisy data. If you’re still storing hashes with a relational link, try thinking about how low the complexity of getting one specific user’s password is.

July 13, 2012 |

Easy C++ benchmarking

The bottleneck in modern server software has moved back to the code. This forces us to make sure that our code is fast enough — the hayai framework provides a convenient solution to that problem.

February 7, 2012 |