MongoDB - kdwivedi1985/system-design GitHub Wiki

MongoDb is open-source NoSQL document based data store.
Every document in MongoDB has a primary key called the _id field, which uniquely identifies the document. A document can also contain another document as the value of one of its fields— that is known as an embedded document. e.g. User and Address in one single JSON object- _{"id": 1, "name": "Alice", "address": { "city": "New York", "zip": "10001" }}.
Mongo also supports reference collection called- Object collection. In this approach related data is stored in separate document. e.g. User- > _{ "id": 1,"name": "Alice","address_id": 101 } and Address will be stored separatly-> _{ "id": 101, "city": "New York", "zip": "10001 }
Mongo is a sharded database, useful in high through secnarios. It default provides strong consistency and partition tolerant. Default cofiguration of Read and Write through primary node makes it strongly consistent.
Usually NOSQL DBs are considered are non ACID compliant but Mongo 4.0 and + are considered as ACID compliant. previous versions did not have Atomicity.
Any field in Mongo can be used for querying but index must be created over those used for filters.

MongoDB uses a multiple-master architecture with replica set. which means Multiple Masters + followers [Primary + Secondary].
Replica set is formed at both controller(config-server) and shared level. It means Controller(config-server) and Shards both has master and followers.
Mongo has a config-server that acts as controller, and maintains the meta-data about shards. And Config-server has its own replica set. Supports hash-based, range-based or zone-based partitioning.
All writes happens through master and data is written to followers for high-availability.
In a MongoDB replica set, all writes go to the primary node. The primary records the operations in a special collection called the oplog (operations log). Secondary nodes replicate data by reading the oplog and applying those changes asynchronously.
By default Read also goes to primary for strong consistency but that can be changed to secondary for read preferences.
Mongo can handle millions of write requests if it is sharded properly.
If sharding is enabled in MongoDB, each collection must have a shard key— a field (or compound of fields) used to determine how data is distributed across shards. e.g., you might use a region field as the shard key to partition data by region."

Mongo 4.2+ has a distributed transaction coordinator or SAGA or 2PC in a distributed environment that supports distributed multi-document transactions.
Atomicity is supported through collections and Mongo supports multi-document transactions and distributed multi-document transactions. With this change mogo becomes a Fully ACID compliant, but the trade off of Atomicity is performance. It’s good for small, quick transactions but large transactions may have conflicts so not recommended to use where relations and strong ACID compliance as they are meant for unstructured data.
Mongo-DB Atlas cloud managed supports auto scaling in cloud, on-perm is manual or through script.

Mongo supports hash-based, range-based and zone-based sharding (geo-based partitioning).
Mongo schema is more flexible with better querying capabilities, compared to Cassandra. Cassandra has limited querying.