Thursday, March 31, 2016

MongoDB is not a Javascript Database

Sometimes people refer to MongoDB as a Javascript/JSON database - it's not, its internal format is BSON - which is a serialised binary representation of objects. Unlike Javascript everything in MongoDB is stored strongly typed and quickly traversable.

That being said, people do access MongoDB with duck-typed Languages like PHP, Perl, Python and yes - Javascript so it can certainly seem that way - and the Mongo shell is is a Javascript REPL so it all seems very Javascript focussed.

Why does this matter and what's this Blog post about?

Well it matters - because one of the things I want from a database is to know that dates are dates, and integers are integers and floating point numbers aren't being stored as strings. Some 'Document' databases just store JSON strings and deal with it at runtime - ACID compliance is one debate but simply knowing you stored a date as a date and not a string is important too.

Actually I just wanted to get that off my chest before I get into he point of this post.

I want to talk about the fact you can Query MongoDB using Javascript - and like many things, this is a very useful piece of rope, you can secure things to things, you can tie things and pull things and climb things - but you can also hang yourself! so use the following information wisely.

In MongoDB, a normal query is sent to the database as an Object - basically query by example - you make an object in language of your choice and send it to a find function on a collection object to find matching Objects.

query = new Document()
cursor = connection.getDatabase("myApp").getCollection("myUsers").find(query)
while(cursor.hasNext()) {

Or in the shell (Javascript)

c = db.getSiblingDB("myApp").users.find({"name","john"})

This is the MongoDB approach - rather than a query language targeted at humans writing strings - which is how SQL was created, this starts with the premise queries will be constructed by code from some form of UI - it's actually a much better and safer fit than constructing SQL strings.

What many don't know is you can also do this.

c = db.getSiblingDB("myApp").users.find(' == "john" ')

Wait - what? Thats a query, as a string!; Is it code or some undocumented  query language? What's that all about.

Well that's the short version of 

c = db.getSiblingDB("myApp").usersfind({ $where: ' == "john"' })

Each record in the database is converted, server side from it's serialised form into a Javascript object 'this' and then the supplied Javascript is run against it and whether that returns true or false determines it it's returned. It's magic and so powerful and ... slow.

The reason it's slow is two fold

 (1) Javascript - this is an interpreted/JIT language and it's never fast despite what node.js fans tell you.

(2) It has to convert every record in the Collection - there are no indexes used here.

So IMPORTANT - DO NOT do this, don't use $where for your queries.

Except $where it makes sense (see what I did there)

What if you cannot express your query using the Mongo query language? The Javascript interpreter in MongoDB is very sandboxed - so you cannot call out to other services or even access other parts of the DB - you are limited to one record but it does let you express, for example WHERE a+b > 5 or even WHERE a > b and b < 5

Let's take that last one as an example and see what the speed is like.

First I'll make a million records in the shell.

for(x=0;x < 1000000 ; x++) {
db.mine.insert({ _id:x, a:Math.random()*1000, b:Math.random()*1000}) } 

Now let us see how long a simple unindexed normal search takes

var start = new Date().getTime();
db.mine.find({a:{$lt:1}}).count() //Count used to ensure it gets not just first 100 results
var end = new Date().getTime();
var time = end - start;
print('Execution time: ' + time);

384ms - Not bad in human terms but you wouldn't want to do very many of them.

Let's try that javascript query above now

var start = new Date().getTime();
db.mine.find("this.a > this.b && this.b < 5").count()
var end = new Date().getTime();
var time = end - start;
print('Execution time: ' + time);

38,226 ms - yes 40 Seconds. This is is why you just don't do this! That database is only 38MB.

But I don't like to leave you without some positive news. MongoDB also allows a hybrid mode, thanks to $where so I can do

c = db.users.find({b:{$lt;:5},$where:'this.a>this.b'}}

This will use an index for b - which I better check is there first

var start = new Date().getTime();
db.mine.find({b:{$lt:5},$where:"this.a > this.b"}).count()
var end = new Date().getTime();
var time = end - start;
print('Execution time: ' + time);

OK That's 263 ms, 150 times faster and I get to do comparisons between fields.

But is it the best option? No, Javascript is rarely the best option. MongoDB offers something much faster and just as capable, the Aggregation Pipeline.

With aggregation, you can write the above query like this. I have to add the last line to do the count

compareVals = { if: { $gt:["$a","$b"] }, then:true ,else:false}
addBool = {$project: {r:"$$ROOT",k:{$cond: compareVals} } }
filterNot = {$match: {k:true} }

var start = new Date().getTime();
var end = new Date().getTime();
var time = end - start;
print('Execution time: ' + time);

And that takes ... 27ms, over 1,500 times faster than the Javascript!!

The debate rages whether the Aggregation Framework is Turing complete - but if you are using MongoDB and running ANY in-database javascript, you probably aren't doing the right thing. MongoDB is NOT a Javascript database.