Blog

Do you really need a “NoSQL” Database?

Strata Logo

Visit me at Strata! (logo (C) OReilly Media)

I’m fortunate enough to have been selected to speak at Strata 2015 in a few weeks on one of my favorite topics, database history. My talk is about the tradeoffs between going with a highly denormalized NoSQL database vs a normalized relational database. In the talk, we get to explore what the creator of the relational database, Edgar Codd, saw as the major problem with databases of his time. Could his criticisms apply to pre-relational databases to today’s mileau of non-relational databases?

Supposedly, relational databases aren’t appropriate for today’s “web scale” problems. Instead, we’re supposed to look at NoSQL technologies to solve problems. It’s not at all that simple. Companies like Stackoverflow and Facebook famously use relational databases to store enormous amounts of data. If that’s the case, then what problem is really getting solved? These companies are also heavy users of non-relational technologies. Why? For what use cases?

In truth – outside of the trendiness of one solution over another, different problems simply call for different solutions. Peek a layer deeper and we can see that in fact NoSQL is a fairly vacuous term, as is the SQL vs NoSQL divide. There’s simply families of databases with different features and affordances. Some of these come in the form of distributed vs non-distributed. Sharded vs non-sharded. Query language vs no query language. Tabular, wide row, or document structured. Strictly linearizable consistency vs last writer wins vs something in between. Which set of features in appropriate for your problem?

One trend, however, that we’ll see come out of database history in my talk and in the “NoSQL” realm is a tradeoff of normalization vs denormalized. The latter grants a higher degree of availability by breaking up rich data relationships at the cost of losing data consistentcy features. As it turns out, without normalization, moving to many NoSQL databases feels a bit like going from programming in Ruby to C. Certainly great power and control comes with being able to manipulate every part of the distributed database. You can control, per query, for example, what consistency options should be used. Many problems call for targeted solutions that need fine-grain control. But with great power comes responsibility. Suddenly you’re even more in the drivers seat with fewer protections to make sure your data stays consistent. Do you have the skills to take this level of control of your data? Do you really need to be programming in C? Or will a relational database suit you perfectly fine?

It’s a fun topic and if you’re confused on how to solve your problem, come see my talk, hopefully we can chat about some ideas. Feel free to find me at Strata or email us to see how OpenSource Connections can be your trusted NoSQL advisor for selecting the correct database.