MongoDB - keshavbaweja-git/guides GitHub Wiki

Run as Docker container

docker run \
--network mongo-net \
--name mongo1 \
-d \
mongo:bionic

docker run \
--network mongo-net \ 
-ti \
--rm \
mongo:bionic \
mongo --host mongo1

Replication

  • What - Storing multiple copies of same dataset on different machines.

  • Why

    • Increase Durability
    • Increase Availability
    • Increase Read Throughput
  • Replica Set - A group of mongod processes that maintain the same data set, thus offering data durability, availability through "replication"

  • Replica Set constituents

    • Exactly one Primary
    • One or more Secondary
    • One or more Arbiter
  • All writes are processed by Primary

  • Secondary provides a copy of the dataset by asynchronously replicating Primary's OpLog entries

  • Arbiter does not hold any data itself, but offers a cheap mechanism to achieve quorum in replica set elections for Primary.

Write Concern

  • Level of acknowledgment requested from MongoDB instances for a write operation.
  • Expressed as {w: <value>, j: <boolean>, timeout: <number>}
    • w - no. of mongod instances to which write operation has propagated
    • j - write operation has been persisted to on-disk journal
    • wtimeout - timeout specified to avoid indefinite blocking on client side

Replica Set - Read Preference

  • By default, all clients read from Primary instance.
  • A read preference can be specified to alter this behaviour.
  • Read Preference consists of
    • Read Preference mode
      • primary
      • primaryPreferred
      • nearest
      • secondary
      • secondaryPreferred
    • Tag Set
    • Max Staleness Seconds

Client Communications

  • MongoDB drivers are designed to handle communications with both standalone server and a replica set.
  • By default, drivers route all read/write traffic Primary server of a replica set with no special configuration by the client application.
  • To establish a connection to a replica set, client application provides a list of seed nodes. This list does not need to include all members of the replica set.
  • MongoDB drivers adhere to Service Discovery and Monitoring spec, and constantly monitor replica set topology to keep track of primary and secondary servers.
  • Replica sets offer high availability in cases of network and server outages. If a primary server goes, MongoDB drivers automatically route the traffic to new Primary server once elected. However during the time when no identified Primary exists, driver will fail to execute operations successfully. If the application needs it, it can be configured to direct read operations to secondary servers during these time (via Read Preferences)

Sharding

  • Sharding or Partitioning is the process of splitting up dataset across multiple machines.
  • Why
    • Increase scalability - increase read/write throughput with multiple machines serving requests
    • Reduce latency - by placing shard/partition on machines based on geo-location proximity

Components of a Sharded Cluster

Shard

A shard holds a subset/partition of the dataset. A shard can be deployed as a replica set.

Config Servers

Config servers hold cluster metadata information. Config Servers must be deployed as a replica set. Restrictions on a config server replica set.

  • No Arbiter
  • No Delayed member
  • Must build indexes
Mongos

Query router for client application, can be deployed along side each client application instance. A mongos instance routes a query to a cluster by

  • Determining the list of shards that must receive the query
  • Establishing a cursor on all of the targeted shards
  • Receiving results from each shard and aggregating them before returning them to the client.
Primary Shard

A primary shard for a database holds all unsharded collections. mongos selects a primary shard when creating a new database by picking the shard that hosts least amount of data.

Sharding gotchas

  • Once shared, a collection can't be unsharded
  • Once shared, shard key can't be changed on a collection
  • From v4.2 onwards, it is possible to change the value of shard key on a document
⚠️ **GitHub.com Fallback** ⚠️