MongoDB - MacKittipat/note-developer GitHub Wiki

Key terms

  • Document : Basic unit of data. Displayed in JSON but stored in BSON (Binary JSON)
  • Collection : A grouping of document
  • Database : A container of collection

Characteristic

  • Schemaless
  • BSON support all JSON type and Date, Number, ObjectID, etc

Data Modelling

  • Data that is accessed together should be stored together.
  • Way to model relationship
    • Embedding
      • Usually use with 1:M and M:M
    • Referencing
  • Relationship
    • 1:1
    • 1:M
    • M:M
  • Tips
    • One-to-One - Prefer key value pairs within the document
    • One-to-Few - Prefer embedding
    • One-to-Many - Prefer embedding
    • One-to-Squillions - Prefer Referencing
    • Many-to-Many - Prefer Referencing

Schema Design

  • Rule 1: Favor embedding unless there is a compelling reason not to.
  • Rule 2: Needing to access an object on its own is a compelling reason not to embed it.
  • Rule 3: Avoid joins/lookups if possible, but don't be afraid if they can provide a better schema design.
  • Rule 4: Arrays should not grow without bound. If there are more than a couple of hundred documents on the "many" side, don't embed them; if there are more than a few thousand documents on the "many" side, don't use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed.
  • Rule 5: As always, with MongoDB, how you model your data depends – entirely – on your particular application's data access patterns. You want to structure your data to match the ways that your application queries and updates it.

Aggregation

Structure

db.collection.aggregate([
    {
        $stage1: {
            { expression1 },
            { expression2 }...
        },
        $stage2: {
            { expression1 }...
        }
    }
])
  • Aggregation : Collection and summary of data
  • Stage : An aggregation operation performed on the data
    • $match
    • $group
    • $sort
    • $limit
    • project : Select fields to return in the result. Same as projection in find(). Normally this will be use as last stage to format output.
    • $set : Creates new fields or changes the value of existing fields, and then outputs the documents with the new fields.
    • $count
    • $out : Write the documents that returned by an aggregation pipeline into a collection. Must be a last stage
  • Aggregation Pipeline : A series of stages completed on the data in order

Run in Docker

docker run -d -p 27017:27017 --name mac-mongo mongo:latest

Query Scripts

// list all db.
show dbs  

// list all collection in db
show collections 

// The command creates a new database if it doesn’t exist
use [DB_NAME] 

// Create collection
db.createCollection("[COLLECTION_NAME]") 

// Find one 
db.listingsAndReviews.findOne() 
db.listingsAndReviews.findOne({property_type: "Apartment"})

// Find
db.listingsAndReviews.find() 
db.listingsAndReviews.find({property_type: "Apartment"})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$gt : 4}})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$gte : 4}})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$lt : 4}})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$lte : 4}})
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$eq : 4}}) // Equal
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$ne : 4}}) // Not equal
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$in : [4, 5]}}) // In 
db.listingsAndReviews.find({property_type: "Apartment", bedrooms: 5, beds: {$nin : [4, 5]}}) // Not in

// Find sub document
db.listingsAndReviews.find({"address.country": "Portugal"})

// Find array : Find all documents that has reviewer name Bart
db.listingsAndReviews.find({"reviews.reviewer_name": "Bart"})

// Logical 
db.listingsAndReviews.find({property_type: "Apartment", $and: [ {bedrooms: 3}, {beds: 4} ]})
db.listingsAndReviews.find({property_type: "Apartment", $or: [ {bedrooms: 3}, {beds: 4} ]})

// Count 
db.listingsAndReviews.countDocuments()
db.listingsAndReviews.countDocuments({property_type: "Apartment"})

// Finds the distinct values for a specified field
db.listingsAndReviews.distinct("property_type")

// Sort & Limit
db.listingsAndReviews.find().sort({price: 1}) // Ascending
db.listingsAndReviews.find().sort({price: -1}) // Decending
db.listingsAndReviews.find().sort({price: -1}).limit(3)

// Projection : Select fields to return in the result
// Inclusion & Exclusion cannot be together except for _id field
db.listingsAndReviews.findOne({}, {name:1, property_type:1, room_type:1, bedrooms: 1, beds: 1}) // Include specific fields
db.listingsAndReviews.findOne({}, {reviews: 0}) // Exclude specific field
db.listingsAndReviews.findOne({}, {name:1, property_type:1, room_type:1, bedrooms: 1, beds: 1, _id: 0}) // Exclude _id field. 

// Aggregate

// === Count each room_type of Hotel. $room_type is reference to room_type field ===
db.listingsAndReviews.aggregate([
  {
    $match : { property_type : "Hotel"}
  },
  {
    $group: { _id: "$room_type", totalRoolTypes: { $count: {} } }
  }
])

// return
[
  { _id: 'Private room', totalRoolTypes: 25 },
  { _id: 'Shared room', totalRoolTypes: 1 }
]
// ======

Insert Scripts

db.listingsAndReviews.insertOne({...})
db.listingsAndReviews.insertMany([{...}, {...}, ...])

Update Scripts


// Replace whole document
db.books.replaceOne(
  {
    _id: ObjectId("6282afeb441a74a98dbbec4e"),
  },
  {
    title: "Data Science Fundamentals for Python and MongoDB",
    isbn: "1484235967",
    publishedDate: new Date("2018-5-10"),
    thumbnailUrl:
      "https://m.media-amazon.com/images/I/71opmUBc2wL._AC_UY218_.jpg",
    authors: ["David Paper"],
    categories: ["Data Science"],
  }
)

// The $set operator replaces the value of a field with the specified value, as shown in the following code:
db.podcasts.updateOne(
  {
    _id: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8"),
  },

  {
    $set: {
      subscribers: 98562,
    },
  }
)

// The $push operator adds a new value to the hosts array field. Here's an example:
db.podcasts.updateOne(
  { _id: ObjectId("5e8f8f8f8f8f8f8f8f8f8f8") },
  { $push: { hosts: "Nic Raboy" } }
)

// Update multiple document
db.books.updateMany(
  { publishedDate: { $lt: new Date("2019-01-01") } },
  { $set: { status: "LEGACY" } }
)

// Similar to db.podcasts.updateOne() but will return updated document to avoid 2 round trips to server (updateOne() and findOne())
db.podcasts.findAndModify({
  query: { _id: ObjectId("6261a92dfee1ff300dc80bf1") },
  update: { $inc: { subscribers: 1 } },
  new: true,
})

Delete Scripts

db.podcasts.deleteOne({ _id: Objectid("6282c9862acb966e76bbf20a") })
db.podcasts.deleteMany({category: “crime”})

Index

  • Single field indexes
  • Compound indexes : For multiple fields.
    • Can be Multikey indexes if one of the field is Array.
    • The order of the fields matters when creating the index and the sort order. It is recommended to list the fields in the following order: Equality, Sort, and Range.
  • Multikey indexes : For Array field
    • Can have 1 Array field per index

Explain

Use explain() : Check if an index is being used on a query

  • The IXSCAN stage indicates the query is using an index and what index is being selected.
  • The COLLSCAN stage indicates a collection scan is perform, not using any indexes.
  • The FETCH stage indicates documents are being read from the collection.
  • The SORT stage indicates documents are being sorted in memory.

Index Script

// List all indexes
db.listingsAndReviews.getIndexes()

// Explain query 
db.listingsAndReviews.explain().find({property_type: "Apartment"})

// Create Single field indexes. Sort by property_type ascending
db.listingsAndReviews.createIndex({property_type: 1})

// Create Compound indexes. "address.country" will be prefix index
db.listingsAndReviews.createIndex({"address.country": 1, accommodates: 1})

// Query with address.country will use Compound indexes because address.country is prefix of the index
db.listingsAndReviews.find({"address.country": "Portugal"}, {reviews:0})

// Use Compound indexes
db.listingsAndReviews.find({"address.country": "Portugal", accommodates: 2}, {reviews:0})
db.listingsAndReviews.find({accommodates: 2, "address.country": "Portugal"}, {reviews:0})

Hide/Delete Index

  • By hiding an index from the planner, users can evaluate the potential impact of dropping an index without actually dropping the index. If the impact is negative, the user can unhide the index instead of having to recreate a dropped index.
db.listingsAndReviews.hideIndex('{INDEX_NAME}')
db.listingsAndReviews.unhideIndex('{INDEX_NAME}')
db.listingsAndReviews.dropIndex('{INDEX_NAME}')

Transaction

  • ACID
    • Atomic : All operation will either succeed or fail together
    • Consitency : All change made by operation are consistent with database constraints
    • Isolation : Multiple transaction can happen at the same time without affect outcome of the other transaction
    • Durability : All change that are made by operation in a transaction will persist, no matter what. Guarantees that data is never lost, even in the event of a power or hardware failure.
  • All single document operation such as update one document are atomic
  • Multi document operation such as update 2 document are not atomic
  • Transaction has 1 mins transaction runtime. Each transaction must be completed within 1 mins.

Cheatsheet

Reference