Google's Spanner

September 17, 2012

Yesterday, I was discussing with Rasmus Makwarth of Opbeat how it's a shame that almost all startups these days are built on a pure business case rather than a technologically driven vision, such as was definitely the case for Google and partly also for Skype among others. These days, doing startups is all about picking a framework, copy-pasting some code, running some database migrations and you're done. Well, almost, but it's definitely different from 2000 for better or worse, which also means that in a lot of areas, technological advances are not really that necessary.

Luckily, Google hasn't forgotten its roots and, despite being a bunch of hypocritical dicks, they still keep pumping out awesome and inspirational white papers about some of their software design work. The BigTable white paper for example sparked the whole Hadoop debacle, a design that has since then been mostly retired at Google but is now "all the rage" in the startup world (I mean, how can you be a respectable startup without a Hadoop cluster!?) Since then, far more interesting white papers have been released, among which Dremel is a very worthy read in terms of pragmatic software design.

A short while ago, Google published another very interesting white paper about "Spanner," their globally distributed database which is once again a brilliant display of pragmatic and simple software design well worth a read or two. While maybe not an eye-opening revolution in database design, it's a brilliant display of a solution to large problems like the following:

This backend was originally based on a MySQL database that was manually sharded many ways.

[..]

Resharding this revenue-critical database as it grew in the number of customers and their data was extremely costly. The last resharding took over two years of intense effort, and involved coordination and testing across dozens of teams to minimize risk

Maybe there's a reason Google are among the few who're pushing the limit...