CouchDB - kdwivedi1985/system-design GitHub Wiki
What is CouchDB?
- CouchDB is an open-source NOSQL database which support traditional map/reduce and mango query language.
- Map/Reduce in CouchDB is a core mechanism for querying and aggregating data. It creates a static view using java script to index and aggregate documents. Map function extracts key-value from each document, sort them and optionally reduce (aggregate) them.
- Views are stored in design document and incrementally updated for efficiently querying over large datasets.
- Mango is a modern, easier-to-use query language within CouchDB for ad-hoc queries.
- Use-Case: CouchDb with Map/Reduce is best for prediucatable, high-performance queries on large dataset.
CouchDb Cluster Architecture
- CouchDB supports multi-master (active-active) replication, meaning every node in the cluster can accept writes independently.
- Changes made on one node are asynchronously replicated to others. Replication can be continuous or one-off.
- Replication is peer-coordinated; any node can initiate replication with any other node
- Replication is of data depends on replication factor.
- Sharding is based on consistent hashing.
- When a client sends a request to any node in the cluster, that node acts as a coordinator. If the node does not hold the shard containing the requested document, it proxies the request internally to one of the nodes that do hold the shard replica.
- Each node maintains Cluster Proxies (Smartproxy/Dumbproxy), which holds meta-data for shards.
CouchDB VS MongoDB
Feature | CouchDB (Map/Reduce Views) | CouchDB (Mango) | MongoDB |
---|---|---|---|
Data Model | JSON documents | JSON documents | BSON documents |
Querying | Predefined JavaScript Map/Reduce views | Declarative JSON queries (ad-hoc) | Rich dynamic queries & aggregation |
Indexing | Precomputed views (B-tree indexes) | On-demand indexes | On-demand, can be less efficient with large data |
Consistency | Eventual consistency | Eventual consistency | Strong consistency (document-level) |
Replication | Multi-master, conflict detection/resolution | Multi-master | Primary-secondary replica sets |
UseCase | Good for read-heavy, Large, stable datasets where queries are known in advance | Flexible, but can scan if no index | High throughput, optimized storage |
- CouchDB with Map/Reduce is best for predictable, high-performance queries on large data.
- Mango is best for rapid development and flexible, simple queries.
How to choose between CouchDB and MongoDB?
- CouchDB was introduced before MongoDB but Mongo is more popular than CouchDB.
- IBM's Cloudant is build on CouchDB.
- Chose CouchDB if - resilience is priority over consistency for you. If you are ok with offline data sync and eventual consistency.
- CouchDB has carved out a niche, especially in environments where replication, offline-first sync, and resilience are critical.