Why FoundationDB Might Be All Its Cracked Up To Be

March 6, 2013 Doug Turnbull
Category: Big Data

When I first heard about FoundationDB, I couldnt imagine how it could be anything but vaporware. Seemed like Unicorns crapping happy rainbows to solve all your problems. As Im learning more about it though, I realize it could actually be something ground breaking.

NoSQL: Lets Review…

So, I need to step back and explain one reason NoSQL databases have been revolutionary. In the days of yore, we used to normalize all our data across multiple tables on a single database living on a single machine. Unfortunately, Moores law eventually crapped out and maybe more importantly hard drive space stopped increasing massively. Our data and demands on it only kept growing. We needed to start trying to distribute our database across multiple machines.

Turns out, its hard to maintain transactionality in a distributed, heavily normalized SQL database. As such, a lot of NoSQL systems have emerged with simpler features, many promoting a model based around some kind of single row/document/value that can be looked up/inserted with a key. Transactionality for these systems is limited a single key value entry (“row” in Cassandra/HBase or “document” in (Mongo/Couch) – well just call them rows here). Rows are easily stored in a single node, although we can replicate this row to multiple nodes. Despite being replicated, it turns out transactionally working with single rows in distributed NoSQL is easier than guaranteeing transactionality of an SQL query visiting potentially many SQL tables in a distributed system.

There are deep design ramifications/limitations to the transactional nature of rows. First you always try to cram a lot of data related to the rows key into a single row, ending up with massive rows of hierarchical or flat data that all relates to the row key. This lets you cover as much data as possible under the row-based transactionality guarantee. Second, as you only have a single key to use from the system, you must chose very wisely what your key will be. You may need to think hard how your data will be looked up through its whole life, it can be hard to go back. Additionally, if you need to lookup on a secondary value, you better hope that your database is friendly enough to have a secondary key feature or otherwise youll need to maintain secondary row for storing the relationship. Then you have the problem of working across two rows, which doesnt fit in the transactionality guarantee. Third, you might lose the ability to perform a join across multiple rows. In most NoSQL data stores, joining is discouraged and denormalization into large rows is the encouraged best practice.

FoundationDB is different

FoundationDB is a distributed, sorted key-value store with support for arbitrary transactions across multiple key-values – multiple “rows” – in the database.

To understand the distinction, let me pilfer an example from their tutorial. Their tutorial models a university class signup system. You know, the same system every CS major has had to implement in their programming 101 class. Anyway, to demonstrate the potential power here, I just want to share a single function with you, the class signup function:

def attendsKey(s, c):    """ Key for student(s) attending class(c)"""    return fdb.tuple.pack(('attends', s, c))def classKey(c):    """ Key for num available seats in class"""    return fdb.tuple.pack(('class', c))@fdb.transactionaldef signup(tr, s, c):    rec = attendsKey(s, c) # generates key for a whether a student attends a class    if tr[rec].present(): return # already signed up (step 3)    seatsLeft = int(tr[classKey(c)]) ## Get the num seats left for a class    if not seatsLeft: raise Exception('no remaining seats') ## (step 3)    classes = tr[attendsKeys(s)] ## Count the number of "attends" records for this student    if len(list(classes)) >= 5: raise Exception('too many classes') ## (step 4)    tr[classKey(c)] = str(seatsLeft-1) ## decrement the available steps    tr[rec] = '' # mark that this student attends this class

Ok, more than one function, but the other functions are just helpers to show you how keys are getting generated.

Important here is that all work is done through signups first argument, tr, this is the transaction object where all work is done. First we check for the existence of a special key that indicates whether student s is attending class c. Then in the same transaction, we work on a completely different “row” – the count of students attending a class. If we are able to, we update that count and then create a row to store the fact that that student stores that class. More important than what is actually happening here, FoundationDB is able to attempt to perform this transaction atomically across the entire cluster.

If this were a more traditional NoSQL store, we would have to take a bit more awkward tack to do this atomically. Wed have to chose either the class or the student to make the row that we can work with atomically. Implicitly, our key would become either a lookup for a class or a lookup for a student. For the sake of discussion, lets say we made our rows classes and we simply stored the id of all the students attending that class in that row. Its trivial to work on classes to add/remove students. We simply lookup a class and append the student id to sign them up.

Conceptually this model is pretty simple, but its lacking if we suddenly want to lookup students in the database. What would that query look like? Can you do it atomically? Youll need to have another type of rows for students. Then you have to entities to work across outside of the transactionality guarantees.

FoundationDB == unopinionated transactions

A big reason that many NoSQL stores were simplified to the atomic row architecture is to get away from the forced large-scale transactionality (and performance hit) of SQL transactions. The solution was to go back to making everything a map and to make accesses to each entry/row/document a transaction. So we all bought into that and began working our schemas into that model.

However, at the end of the day both SQL and traditional NoSQL are both very opinionated about a transaction should be. Despite the transaction manifesto, Foundation is completely unopinionated when it comes to how you define transactions. The same signup code above could easily be implemented as two or three transactions if that was truly what was called for.

This power is expressed in how you access Foundation. Foundation gets exposed more as a library for defining transactions on an arbitrary key-value store. This narrower aim lets you write code in your language, not constrained to a second query language or awkwardly fitting your code to an ORM. Instead, You write natural code expressing the transactions that you want to perform over the key-value store. Pretty exciting stuff.

Whoah whoah whoah, slow your roll sparky, looks cool and all but prove this isnt a giant boondongle?

Ok Foundation is new and unproven. Theres plenty of unanswered questions about it. How does it perform vs {HBase/Cassandra/Mongo/Couch/…}? What is the cost of this transactionality? At what point does its transactional architecture stop scaling? What are the trade-offs? Etc Etc

Yeah, yeah so dont start rewriting all your database code to use Foundation, that would be pretty crazy. Nevertheless, the unopinionated, highly client-controlled notion of transactionality is ground-breaking, obviously useful, and Im hopeful it can be successful.