Apache Cassandra - MacKittipat/note-developer GitHub Wiki

Intro

NoSQL distributed database
Wide column
Peer to peer architecture
- No single point of failure
Support Partition(Sharding) & Replication

Model

Keyspace = Container of Table. Similar to Database in RDMBS
Table = Table

Primary Key

The Primary Key. Must be unique.
The Partition Key is responsible for data distribution across nodes.
The Clustering Key is responsible for data sorting within the partition.
The Composite/Compound Key is a multiple-columns key

Example

PRIMARY KEY (a): The partition key is a.
PRIMARY KEY (a, b): The partition key is a, the clustering key is b.
PRIMARY KEY ((a, b)): The composite partition key is (a, b).
PRIMARY KEY (a, b, c): The partition key is a, the composite clustering key is (b, c).
PRIMARY KEY ((a, b), c): The composite partition key is (a, b), the clustering key is c.
PRIMARY KEY ((a, b), c, d): The composite partition key is (a, b), the composite clustering key is (c, d).

Good for write intensive app

Example below has one partition key which is state name (TX) and one clustering key which is city name (Dallas, etc)

Write very fast. Cassandra uses log-structured merge trees for storing, which means all writes are done sequentially (the database is the append-only log), which results in lower write latency
Write to Memory (MemTable) and HDD (Commit Log). When MemTable full, Cassandra will flush to HDD (SSTable) then Commit log will be deleted.
- MemTable sort data by clustering key. Use for read
- Commit Log, append only. Use for restore MemTable if node crash
- SSTable (Sorted String Table) same structure as Member but it is stored on disk. It's immutable
https://www.youtube.com/watch?v=mDd4I-isodE

Reference

http://stackoverflow.com/questions/24949676/difference-between-partition-key-composite-key-and-clustering-key-in-cassandra