Cassandra - kdwivedi1985/system-design GitHub Wiki

Cassandra is a distributed NoSQL wide-column DB database that stores data across the nodes. It is similar to Dynamodb. Wide column DBs have rows and columns but each row may have different columns.
These databases store data in tables, but unlike traditional relational databases, each row can have a different number of columns, and columns are grouped into column families.
Sample data for User_Activity:

Row Key	Activity:2024-06-01	Activity:2024-06-02	Activity:2024-06-03
user_123	{"login": true, "time": "08:15"}	{"login": true, "time": "09:10"}	{"login": false}

Query can be formed on partition key + cluster key. Clustering keys are defined as part of the table’s primary key (after the partition key).
- PRIMARY KEY ((user_id), activity_date)
- user_id is the partition key, and activity_date is the clustering key. All activities for a user are stored together, sorted by date.
The partition key determines how data is distributed across the cluster. It decides which node stores a particular row. e.g. user_id can be a partition key. Clustering Key determines the order in which rows are stored within a partition. All rows with the same partition key are sorted and stored on disk according to their clustering key values.
Apart for cluster key, an index (secondary index or storage-attached index) can be created which allows you to efficiently query data using columns that are not part of the primary key (partition or clustering key). Only columns which are indexed can be used for querying.
- Index- Use for applications that require fast read and write speeds.
It is considered eventually-consistent as reads and writes both can happen on any node.

Cassandra is masterless, all nodes are considered as slaves, it doesn't mean all data is replicated to all nodes.
There is a Replication Factor (RF), based on RF value data that will be replicated to a number of nodes. e.g. RF=3, means 3 replicas. so data will have 3 replicas for high availability.
All nodes communicate with each other and maintain the partition details(meta-data) about all the nodes. An incoming request can go to any node. That node acts as coordinator. It will figure out which node data needs to be replicated based on the partition key, and send requests to all those nodes. Coordinator replicates data to all the nodes. No peer-peer sync.In the following example coordinator 59 is relocating the data to node 0, 83 and 67.

Cassandra has different consistency levels (e.g., QUORUM, ALL, ONE,Two,Any), it waits for those many replicas to acknowledge read/write before it is considered as successful.
Default consistency level is- One, All- can make cassandra strongly consistent.
Quorum- Majority of replicas acknowledge the write.
Cassandra stores all the meta-data, nodes and token/partition mapping in memtable(in-memory) and also in disc in System Tables (SSTable). * Each node maintains the meta-data (memtable and SSTable) for all nodes.
Diagram showing data simultaneously moving to Commit log and Memtable. The Memtable is periodically flushed to SSTable for storage.

Cassandra supports auto-scaling using - Kubernetes with the Cass-Operator (DataStax) or K8ssandra.
Rebalancing needs to be done manually if node is added or removed.
Mongo and Cassandra supports regional clusters and needs to be created manually.
Cassandra doesn't support custom Sharding like Mongo. Cassandra supports only one type of automatic, hash-based sharding, controlled by the partition key and consistent hashing.
Zone-aware sharding can be achieved by Active-Active, Multi-Region Clusters:
- Deploy Cassandra clusters in each major region (e.g., US, EU, APAC) using the NetworkTopologyStrategy.
- This allows each region to have its own set of nodes and replicas, providing low-latency access for local users and resilience in case of regional failures.
  - CREATE KEYSPACE orders WITH REPLICATION = {class: 'NetworkTopologyStrategy','US-East': 3,'EU-West': 3}
  - Each write is replicated to 3 nodes in US-East and 3 nodes in EU-WestIf a whole region (e.g., US-East) goes down, EU-West can still serve requests. This provides resilience and high-availability and local read/writes.It reads/writes to the local DB and behind the seen data will be replicated. QUORUM will only check for local replicas.

Cassandra achieves per-row column flexibility by storing each row as a map of columns, where columns are key-value pairs that can be added or omitted dynamically.
- The row key (e.g., user_id) identifies the row.
- Rows- Internally rows are stored as map or key-value pairs. Keys are column names and the values are the corresponding cell data.

Primary Indexing: The partition key is always indexed, and lookups using the partition key are the fastest and most efficient.
Secondary Indexing (2i): Allows you to create indexes on non-partition columns, including clustering columns, static columns, and collection types (lists, sets, maps). These indexes are stored in separate, hidden tables (column families) on each node. After creating an index, you can query using that column, but the performance is best when used in conjunction with the partition key. Secondary indexes are local to each node and not globally distributed, so queries may need to contact all nodes, which can impact performance for high-cardinality columns or large clusters.
Storage-Attached Indexing (SAI): SAI is a newer, more flexible indexing method that attaches index data directly to SSTables, supporting a wide range of data types and enabling efficient queries on non-partition columns. It can replace Secondary Index.

Feature	MongoDB	Cassandra	Relational DB (SQL)
Data Model	Document (JSON/BSON)	Wide-column (rows with dynamic columns)	Relational (tables with fixed schema)
Schema	Flexible / schema-less	Flexible per row family	Strict schema (tables, columns)
Use Case	Operational apps with nested or semi-structured data	Write-heavy, high-availability systems	Transactional apps needing strong consistency
Joins	Not supported (use embedding or $lookup)	Not supported	Fully supported
Language	MongoDB Query Language (MQL)	CQL (Cassandra Query Language)	SQL (Structured Query Language)
Write Consistency	Tunable (eventual by default)	Eventual by default (tunable)	Strong (ACID)
Read Consistency	Tunable	Eventual by default (tunable)	Strong (unless configured otherwise)
Horizontal Scaling	Native sharding	Excellent (built-in, masterless)	Difficult (usually vertical scale)
Performance Goal	Balanced read/write	Write-optimized	Read-optimized with consistent writes
Indexing	Rich secondary indexing	Limited indexing support	Rich indexing support
Best For	Product catalogs, CMS, analytics	IoT, messaging, logging, real-time big data	Financial, ecommerce, ERP, CRM