NoSQL - ilya-khadykin/notes-outdated GitHub Wiki

NoSQL vs relational databases

NoSQL db Relational dbs
more flexible for schema changes SQL was designed to be a query language for relational databases
many NoSQL dbs allow definition of fields on record creation relation databases are usually table-based, almost like spreadsheets
nested values are common in NoSQL databases records stored in rows; columns represent fields in rows
fields are not standardized between records SQL queries within or between tables in relation database

NoSQL databases types

document stores:

  • documents are stored in structured format (XML, JSON etc);
  • usually organized into "collections" or "databases";
  • individual documents have unique structures;
  • each document usually has a specific key;
  • it is possible to query a document by fields;

key-value stores:

  • you have a key you can query by, and the value at that key (you usually can't query by anything other that key)
  • some key-value store let you define more than one key;
  • sometimes used alongside relational databases for caching

BigTable/tabular:

  • named after Google`s proprietary "BigTable" implementation;
  • each row can have a different set of columns;
  • designed for large number of columns;
  • rows are typically versioned

graph databases:

  • designed for data best represented as interconnected nodes (a series of road intersections);

object databases:

  • tightly integrated with object oriented programming language used;
  • act as a persistence layer: store objects directly;
  • you can link objects directly through pointers

Popular NoSQL dbs

CouchDB

Document db written in Erlang

MongoDB

Document db which uses JavaScript

Notes:

  • querying is not done over HTTP (in comparison with CouchDB)
  • native drivers for each language
  • does not support CouchDB-style views
  • only master/slave replication: only master copies can write data
  • consistent, partition-tolerant db
    • all users always get the same data back from MongoDB
    • documents are partitioned using sharding
    • each partion will have a subset of the records
    • shards are created based on key you choose (allows you customize how MongoDB partions the db)

structure and querying in MongoDB

  • structure: database/collection/record
  • JavaScript-based querying somewhat similar to SQL
  • still has schema-free structure
  • can define MapReduce functions

Cassandra

Originally developed by Facebook

Notes:

  • querying not over HTTP
  • native driver for each language
  • cross between key/value store and tabular database
  • available, partition-tolerant db:
    • you should always be able to read from and write to Cassandra
    • hardware nodes can be added with no downtime
    • consistency can be adjusted, although this will affect the availability

structure and querying in Cassandra

  • each key maps to one or more columns
  • columns can be grouped into column families
  • Cassandra Query Language (CQL) is similar to SQL
  • CQL specifically designed for column groups and adjusted consistency

Riak

Document db written in Erlang

Notes:

  • MapReduce functions can be written in Erlang as well as JavaScript
  • designed primarily to work on Mac and Linux
  • available, partition-tolerant db:
    • you should always be able to read from and write to Riak
    • hardware nodes can be added easily

structure and querying in Riak

  • structure: bucket/key/value
  • query syntax is the same as the Lucene full-text search engine
  • can define MapReduce function
  • key filters allow you to pick up records with keys matching certain criteria

Redis

key/value store

Notes:

  • querying not over HTTP
  • native drivers for each language
  • designed primarily to work on Mac and Linux (does not have Windows support)
  • master/slave replication
  • consistent, partition-tolerant db:
    • each user should always get the same data back from Redis
    • writing directly to a slave is possible, but violates consistency
    • data replicated to multiple slaves

structure and querying in Redis

  • queries primarily by key
  • specific values from hashes within records can be retrieved
  • value does not have to be a string, unlike many key/value stores
  • lists, sets, and hashes of strings
    • lists are lists of strings
    • hashes are further key/value pairs
    • sets are non-repeating values