CouchDB - kdwivedi1985/system-design GitHub Wiki

What is CouchDB?

  • CouchDB is an open-source NOSQL database which support traditional map/reduce and mango query language.
  • Map/Reduce in CouchDB is a core mechanism for querying and aggregating data. It creates a static view using java script to index and aggregate documents. Map function extracts key-value from each document, sort them and optionally reduce (aggregate) them.
  • Views are stored in design document and incrementally updated for efficiently querying over large datasets.
  • Mango is a modern, easier-to-use query language within CouchDB for ad-hoc queries.
  • Use-Case: CouchDb with Map/Reduce is best for prediucatable, high-performance queries on large dataset.

CouchDb Cluster Architecture

image

  • CouchDB supports multi-master (active-active) replication, meaning every node in the cluster can accept writes independently.
  • Changes made on one node are asynchronously replicated to others. Replication can be continuous or one-off.
  • Replication is peer-coordinated; any node can initiate replication with any other node
  • Replication is of data depends on replication factor.
  • Sharding is based on consistent hashing.
  • When a client sends a request to any node in the cluster, that node acts as a coordinator. If the node does not hold the shard containing the requested document, it proxies the request internally to one of the nodes that do hold the shard replica.
  • Each node maintains Cluster Proxies (Smartproxy/Dumbproxy), which holds meta-data for shards.

CouchDB VS MongoDB

Feature CouchDB (Map/Reduce Views) CouchDB (Mango) MongoDB
Data Model JSON documents JSON documents BSON documents
Querying Predefined JavaScript Map/Reduce views Declarative JSON queries (ad-hoc) Rich dynamic queries & aggregation
Indexing Precomputed views (B-tree indexes) On-demand indexes On-demand, can be less efficient with large data
Consistency Eventual consistency Eventual consistency Strong consistency (document-level)
Replication Multi-master, conflict detection/resolution Multi-master Primary-secondary replica sets
UseCase Good for read-heavy, Large, stable datasets where queries are known in advance Flexible, but can scan if no index High throughput, optimized storage
  • CouchDB with Map/Reduce is best for predictable, high-performance queries on large data.
  • Mango is best for rapid development and flexible, simple queries.

How to choose between CouchDB and MongoDB?

  • CouchDB was introduced before MongoDB but Mongo is more popular than CouchDB.
  • IBM's Cloudant is build on CouchDB.
  • Chose CouchDB if - resilience is priority over consistency for you. If you are ok with offline data sync and eventual consistency.
    • CouchDB has carved out a niche, especially in environments where replication, offline-first sync, and resilience are critical.