Apache Cassandra - MacKittipat/note-developer GitHub Wiki

Intro

  • NoSQL distributed database
  • Wide column
  • Peer to peer architecture
    • No single point of failure
  • Support Partition(Sharding) & Replication

Model

  • Keyspace = Container of Table. Similar to Database in RDMBS
  • Table = Table

Primary Key

  • The Primary Key. Must be unique.
  • The Partition Key is responsible for data distribution across nodes.
  • The Clustering Key is responsible for data sorting within the partition.
  • The Composite/Compound Key is a multiple-columns key

Example

PRIMARY KEY (a): The partition key is a.
PRIMARY KEY (a, b): The partition key is a, the clustering key is b.
PRIMARY KEY ((a, b)): The composite partition key is (a, b).
PRIMARY KEY (a, b, c): The partition key is a, the composite clustering key is (b, c).
PRIMARY KEY ((a, b), c): The composite partition key is (a, b), the clustering key is c.
PRIMARY KEY ((a, b), c, d): The composite partition key is (a, b), the composite clustering key is (c, d).

Good for write intensive app

Example below has one partition key which is state name (TX) and one clustering key which is city name (Dallas, etc) image

  • Write very fast. Cassandra uses log-structured merge trees for storing, which means all writes are done sequentially (the database is the append-only log), which results in lower write latency
  • Write to Memory (MemTable) and HDD (Commit Log). When MemTable full, Cassandra will flush to HDD (SSTable) then Commit log will be deleted.
    • MemTable sort data by clustering key. Use for read
    • Commit Log, append only. Use for restore MemTable if node crash
    • SSTable (Sorted String Table) same structure as Member but it is stored on disk. It's immutable
  • https://www.youtube.com/watch?v=mDd4I-isodE

Reference