Apache Cassandra - MacKittipat/note-developer GitHub Wiki
Intro
- NoSQL distributed database
- Wide column
- Peer to peer architecture
- No single point of failure
- Support Partition(Sharding) & Replication
Model
- Keyspace = Container of Table. Similar to Database in RDMBS
- Table = Table
Primary Key
- The Primary Key. Must be unique.
- The Partition Key is responsible for data distribution across nodes.
- The Clustering Key is responsible for data sorting within the partition.
- The Composite/Compound Key is a multiple-columns key
Example
PRIMARY KEY (a): The partition key is a.
PRIMARY KEY (a, b): The partition key is a, the clustering key is b.
PRIMARY KEY ((a, b)): The composite partition key is (a, b).
PRIMARY KEY (a, b, c): The partition key is a, the composite clustering key is (b, c).
PRIMARY KEY ((a, b), c): The composite partition key is (a, b), the clustering key is c.
PRIMARY KEY ((a, b), c, d): The composite partition key is (a, b), the composite clustering key is (c, d).
Good for write intensive app
Example below has one partition key which is state name (TX) and one clustering key which is city name (Dallas, etc)
- Write very fast. Cassandra uses log-structured merge trees for storing, which means all writes are done sequentially (the database is the append-only log), which results in lower write latency
- Write to Memory (MemTable) and HDD (Commit Log). When MemTable full, Cassandra will flush to HDD (SSTable) then Commit log will be deleted.
- MemTable sort data by clustering key. Use for read
- Commit Log, append only. Use for restore MemTable if node crash
- SSTable (Sorted String Table) same structure as Member but it is stored on disk. It's immutable
- https://www.youtube.com/watch?v=mDd4I-isodE