No matter how long you’ve been programming, you have been troubled by the question. The world was a calm, pristine place with only relational databases around (sure, there was some din about object databases, but it could never rear its ugly head); all modeling was done in terms of tables, and joins were everybody’s best friends.
Then came the barrage of NoSQL products. These databases sacrifice some of the most important characteristics that databases have conventionally stood for, and as a result, have gathered a lot of attention.
But this post is not about extolling the virtues of NoSQL. It’s about answering the million-dollar question (or if you are a seeded startup, the $20k question, or whatever) — which NoSQL database is the best?
To approach this, we need to dive into CAP theorem first. Sorry, but your conventional view of a database (a reliable, fast storage space) doesn’t apply anymore. Turns out you have to choose two of the three attributes (C = Consistency, A = Availability, P = Partition Tolerance) before you can even begin to talk about making a selection.
And here your troubles begin. You are not sure whether Consistency or Availability will suit you better. And you’re told that you just cannot sacrifice Partition Tolerance at all, which limits your choices to two. Now, if you sacrifice Availability, people will complain about your slow website. And if you sacrifice Consistency, people will complain about not being able to save changes even after reloading pages. And just what is this Partition Tolerance they keep insisting on?
Your immediate search leads you to MongoDB. Querying is a bit odd but it looks good otherwise. As soon as you are ready to decide on it, you read about the horrors of MongoDB losing data and needing all the RAM in your neighborhood. You get apprehensive, and start looking around even more. Chances are you’ll land on CouchDB, a very nice database written in Erlang. Except that it’s focused on Availability and not Consistency. And you’re told that it’s good for archival data only, and it has a nice REST interface but horrid Map-Reduce style of queries.
Search a little more and you’ll see more variants of Couch to compound the confusion: CouchBase, BigCouch, Mobile Couch, and whatnot. And then there’s the very confusing merger of something and membase merging into CouchBase or something else. Good luck sorting out this mess. And if you unfortunately happen to like CouchBase, you have choose between the limited community edition or shelling out $5,000+ per node.
Then you hear about Cassandra, and apparently it’s awesome. Except that Facebook moved off it to customized MySQL (you wonder why). It’s also supposed to be write heavy — does that mean it sucks for reads? What kind of database will be used to write only and not read? And what about HBase, Riak, Membase, CockroachDB, RethinkDB, RavenDB, LevelDB, Hypertable, OrientDB, Hadoop, Neo4J, Infinite Graph, and managed solutions like DynamoDB, Object Store (Google)? How many of these are easily scalable? How many of these support multi-master replication (hint: MongoDB doesn’t)? Multi-datacenter replication? How many will perform well? How many have good communities to get your queries answered?
If your head has started reeling, join the club. I’ve practically gone nuts trying to decide on a stable, reliable NoSQL product that isn’t overly complicated to work with. Sadly, I’ve found none.
So which is the best NoSQL database? None, I say.
Is there no hope, then?
Before MariaDB 10.1, there wasn’t. But since the 10.1 version added dynamic columns, MariaDB is going to be my preferred database for new projects (unless I must store graphs). With multi-master replication built in, and years of RDBMS foundation, there’s little chance you can go wrong.
Dive into NoSQL and explore by all means, but only if you have nothing better to do.