MongoDB - Hivefication/Hive GitHub Wiki

##Classification We can generally classify the databases in two categories, the relational (MySQL, PostgreSQL, Oracle, ...) and the NoSQL (Not Only SQL). However, a more comprehensive classification can be achieved : Classement

In the NoSQL world, many databases exist :

###NoSQL

  • MongoDB
  • Riak
  • Redis
  • CouchDB
  • Neo4J
  • Cassendra
  • Dynamo
  • SimpleDB

###Schema This is four different kind of schemas of the different NoSql DB. (Key-value, Column, Graph & Document). NoSql Schema

##CAP theorem - Brewer's theorem The theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

Cap theorem for NoSQL systems

##Sharding and replication with MongoDB Sharding is MongoDB’s approach to scaling out. Sharding partitions a collection and stores the different portions on different machines. Sharding automatically balances data and load across machines.

Shards. A shard is a container that holds a subset of a collection’s data. Each shard is either a single mongod instance or a replica set. In production, all shards should be replica sets.
Config servers. Each config server is a mongod instance that holds metadata about the cluster. The metadata maps chunks to shards.
Mongos instances. The mongos instances route the reads and writes from applications to the shards. Applications do not access the shards directly.

Database replication ensures redundancy, backup, and automatic failover. Replication occurs through groups of servers known as replica sets.

Non sharded system Auto-sharded system Auto-sharded + replica set system

##Mongoose We used the Node.js plugin Mongoose to simulate some aspect of the relational databases. This plugins add an easy way to structure and validate (filter input) the datas that are stored in the MongoDB collections.

##Benchmark MongoDB vs MySQL We benchmarked few aspects of a simple MongoDB collection against a similar MySQL database. This was realised with Node.js and the plugins mysql v2.0.0 and mongodb v1.3.0 as driver for the database.

###MySQL tables used

CREATE TABLE IF NOT EXISTS T_User (  
    idUser INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(30) NULL,
    surname VARCHAR(45) NULL,
    PRIMARY KEY (idUser),
    INDEX flkNameUser (name ASC),
    INDEX flkSurnameUser (surname ASC) ) ENGINE = InnoDB;
CREATE  TABLE IF NOT EXISTS T_Badge (  
    idBadge INT NOT NULL AUTO_INCREMENT,  
    name VARCHAR(30) NULL,  
    PRIMARY KEY (idBadge),  
    INDEX flkName (name ASC) ) ENGINE = InnoDB;  
CREATE  TABLE IF NOT EXISTS T_BadgeUser (  
    idBadge INT NOT NULL ,  
    idUser INT NOT NULL ,  
    PRIMARY KEY (idBadge, idUser) ,  
    INDEX fk_T_Badge_has_T_User_T_User1 (idUser ASC) ,  
    INDEX fk_T_Badge_has_T_User_T_Badge (idBadge ASC) ,  
    CONSTRAINT fk_T_Badge_has_T_User_T_Badge  
        FOREIGN KEY (idBadge )  
        REFERENCES mango.T_Badge (idBadge )  
        ON DELETE NO ACTION  
        ON UPDATE NO ACTION,  
    CONSTRAINT fk_T_Badge_has_T_User_T_User1  
        FOREIGN KEY (idUser )  
        REFERENCES mango.T_User (idUser )  
        ON DELETE NO ACTION  
        ON UPDATE NO ACTION) ENGINE = InnoDB;  

###MongoDB Collection The collection has no indexed field.

{
    "idkey"   : 12, 
    "name"    : "Aubert", 
    "surname" : "Gregoire", 
    "badges"  : 
    [
        {
            "idbadge" : 1, 
            "name"    : "badge 1"
        },
        {
            "idbadge" : 3, 
            "name"    : "badge 3"
        }
    ]
}

##Results Here are the results of the benchmark for each query we tried. Each query are executed 20 time on differents size of data, exept for the insert queries that are executed one time for each data to insert.
For this benchmark we didn't try to shard the MongoDB database, which is one of it's main feature.
Warning : pay attention to the y axis, labeled as seconds, but in reality these are milliseconds for all the results.

###Insert The first result is for the insert operation of the number of user. We can see that MongoDB is way faster than SQL which verify the integrity of each row he is inserting. Benchmark for insert and select * from T_User where id = 1

###Lookup 1 SQL query for this result:

SELECT U.name as nameUser, U.surname as surname, B.name as namebadge 
FROM T_User U 
     INNER JOIN T_BadgeUser BU ON U.idUser = BU.idUser 
     INNER JOIN T_Badge B ON B.idBadge = BU.idBadge 
WHERE U.idUser = i;

MongoDB query:

Users.findOne({'idkey':i})

Benchmark for insert and select * from T_User where id = 1

###Lookup 2 SQL query for this result:

SELECT * 
FROM T_User U 
	INNER JOIN T_BadgeUser BU ON U.idUser = BU.idUser 
	INNER JOIN T_Badge B ON B.idBadge = BU.idBadge;

MongoDB query:

Users.find().toArray()

Benchmark for insert and select * from T_User where id = 1

###Lookup 3 SQL query for this result:

SELECT * 
FROM T_User U 
	INNER JOIN T_BadgeUser BU ON U.idUser = BU.idUser 
WHERE BU.idBadge = i;

MongoDB query:

Users.find({badges : {$elemMatch: {idbadge:i}}}).toArray();

Benchmark for insert and select * from T_User where id = 1

###Lookup 4 SQL query for this result:

SELECT U.surname, count(*) as countSurname 
FROM T_User U 
GROUP BY U.surname 
ORDER BY countSurname DESC;

MongoDB query:

Users.aggregate(
    { $group : {  _id : '$surname', 
    	          countSurname : { $sum : 1 }
    	       }
    },
    { $sort : { 'countSurname' : -1} }
);

Benchmark for insert and select * from T_User where id = 1

###Lookup 5 SQL query for this result:

SELECT U.name, count(*) as countBadges 
FROM T_User U 
	INNER JOIN T_BadgeUser BU ON U.idUser = BU.idUser 
WHERE BU.idBadge = i
GROUP BY U.name 
ORDER BY countBadges DESC;

MongoDB query:

Users.aggregate(
    { $match: { badges: { $elemMatch: { idbadge:i }}}},
    { $group : {  id : '$name', 
    	          countBadgesForName : { $sum : 1 }
    	       }
    },
    { $sort : { 'countBadgesForName' : -1} }
);

Benchmark for insert and select * from T_User where id = 1

##References MongoDB
Sharding with MongoDB
Replication with MongoDB
CAP theorem
Mongoose
Node-MongoDB nativ driver