MongoDB - shaysalomon12/Data-Engineer GitHub Wiki

Configuration file

This will usually be at:

cat /etc/mongod.conf

Connnection String:

mongodb+srv://<username>:<password>@<host_ip_address>:27017/?authMechanism=DEFAULT&authSource=<default_database>

Log file locarion:

Inside MongoDB configuration file (/etc/mongod.conf) change the value of dbpath to the new location:

dbpath = /usr/local/var/mongodb

Monitor MongoDB's Performance:

Populating a collection called demo (mongodb cli):

for (let i = 1; i <= 100; ++i) {
    db.demo.insertOne({
        "number": "1000-" + i,
        "currency": i > 500000 ? "USD" : "EUR",
        "balance": Math.random() * 100000
    })
}

Show databases size (mongodb cli):

> show dbs
db1                   0.019GB
db2                  71.214GB
...                   0.000GB
local                 0.000GB
test                  0.007GB

Show process running curently and how long each is running (mongodb cli):

db.currentOp()

Look at:

  • "opid" --> the process_id
  • "secs_running" --> seconds this process is running
  • "microsecs_running" --> NumberLong()

Any process running more than 100ms is considered a slow running query

To kill a long running process (mongodb cli):

db.killOp(<opid>)

List top queries (bash):

mongotop

Statistics about MongoDB operatrions (bash):

mongostat

Profiling

  • Verify current profiling status
> db.getProfilingStatus()
  • There are 3 profiling levels: 0 - Disabled 1 - Enabled (100ms) 2 - Enabled for everything

  • Set as Profiling level for database CETServices to write to log commands with execution time of 500msec and above:

> use CETServices
switched to db CETServices

> db.setProfilingLevel(0, { slowms: 500 })
{ was: 0, slowms: 500, sampleRate: 1, ok: 1 }

To verify the settings:
> db.getProfilingStatus()
{ was: 0, slowms: 500, sampleRate: 1, ok: 1 }

To find long queries that the profiler is monitoring:
> db.system.profile.find() 

- Start monitoring all operations (level=2) with filter for a specific collection (not sure such filter works)
-- Set Monitoring level=2 (all operations) for current database:
db.setProfilingLevel( 2, { filter: { op: "query", ns : 'test.demo' } } )

-- Display all operations for collection 'test.demo':
db.system.profile.find( { ns : 'test.demo' } ).pretty()

Explain()

  • For example, let's create a collection named foo:
> db.foo.insert({x:123})

-- Now let's retreive tha data:
> db.foo.find({x:123})

-- Let's run .explain() on the query:
> db.foo.find({x:123}).explain()

-- We got some result. LEt's see the value of:
"winningPlan" : {
                        "stage" : "COLLSCAN",

It says  "stage" : "COLLSCAN" - this means a collection scan. Like a table full scan.

-- Let's create an index on "x":
> db.foo.creteIndex({x:1})

-- And now, when we run .explain() we see it is using the index:
"winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "x" : 1
                                },

It says  "stage" : "IXSCAN" - this means using index.

-- Try to avoid (full) scan + sort - this is a time consumer query.

We can see the the explain results:
"winningPlan" 	- the chosen plan
"rejectedPlans" - candidate plans that were rejected

How to Use the MongoDB Profiler and explain() to Find Slow Queries:

Handling Slow Queries in MongoDB - Part 1: Investigation:

https://rockset.com/blog/handling-slow-queries-in-mongodb-part-1-investigation/

Log Messages

Searching for errors inside log file (bash):

cat /var/log/mongodb/mongod.log | grep '** ERROR:'

-- Or
cat /var/log/mongodb/mongod.log | /run/mtools/jq

At the bottom of the page of the following link (Log Parsing Examples) you can see

Examples (bash):

-- Find all error messages in JSON attribute ".attr.error.errmsg"  that are "Connection reset by peer":
# /run/mtools/jq '. | select(.attr.error.errmsg=="Connection reset by peer")' /var/log/mongodb/mongod.log.2022-12-23T00-00-01

-- Count all types of messages in JSON attribute ".attr.error.errmsg":
/run/mtools/jq -cr '.attr.error.errmsg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn

-- Count all types of messages in JSON attribute ".msg":
# /run/mtools/jq -cr '.msg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn

-- Show Slow query: messages with .msg = "Slow query"
/run/mtools/jq '. | select(.msg=="Slow query")' /var/log/mongodb/mongod.log.2022-12-23T00-00-01

-- Show Slow operations that took more than 2000 milliseconds:
/run/mtools/jq '. | select(.attr.durationMillis>=2000)' /var/log/mongodb/mongod.log.2022-12-23T00-00-01

-- List all clients (.attr.client) that got msg=="Successful authentication" grouped and sorted
# /run/mtools/jq '. | select(.msg=="Successful authentication") | .attr.client' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn

-- List all clients (.attr.client) that got msg=="Successful authentication" between range of hours grouped and sorted
# /run/mtools/jq '. | select(.msg=="Successful authentication") | .attr.client' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn

-- Same as before, but this time extract only the IP without the port (172.17.31.94:56775):
# /run/mtools/jq '. | select(.t["$date"] >= "2023-05-11T16:30:00.000" and .t["$date"] <= "2023-05-11T07:00:00.000" and .msg=="Successful authentication") | .attr.client | split(":")[0]' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn

-- List all queries with duration longer than 20000ms and between range of hours 
/run/mtools/jq '. | select(.t["$date"] >= "2023-02-21T17:30:00.000" and .t["$date"] <= "2023-02-21T18:30:00.000" and .attr.durationMillis>=20000)' /var/log/mongodb/mongod.log > slow_queries_over_200ms_20230222.txt

/run/mtools/jq '. | select(.t["$date"] >= "2023-02-21T17:30:00.000" and .t["$date"] <= "2023-02-21T18:30:00.000" and .attr.durationMillis>=20000 and .c != "COMMAND")' /var/log/mongodb/mongod.log > slow_queries_over_2000ms_20230222.json
- Count all types of messages in JSON attribute ".msg":
# /run/mtools/jq -cr '.msg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn

/run/mtools/jq '.[] | select(.msg=="Successful authentication") | .client' /var/log/mongodb/mongod.log.2022-12-23T00-00-01

Python script to access MongoDB

from pymongo import MongoClient
import pandas as pd     # Not sure Pandas can show in PyCharm
import pql              # Python-Query-Language: translates python expressions to MongoDB queries.

# MongoDB connection string
myClient = MongoClient(
    host='<server_ip_address>:27017',
    serverSelectionTimeoutMS=3000,
    username="<username>",
    password="<password>"
)

# Select Database and Collection
myDB = myClient["<db_name>"]
myCollection = myDB["<collection_name>"]

# Query collection test.demo for number = '1000-10'
myQuery = {
        "balance" : {
            "$gte" : 20,
            "$lte" : 30
        }
    }

# Run query (using sort and limit)
myDoc = myCollection.find(myQuery).sort("balance", -1).limit(5)  # -1 descending

# Print using Pandas
# entries = list(myDoc)
# df = pd.DataFrame(entries)
# df.head


# Print the results to the console
for x in myDoc:
    print(x)

# Show all Databases
# database_names = myClient.list_database_names()
# print ("\ndatabases:", database_names)

# PQL translates python expressions to MongoDB queries.
# print(pql.SchemaFreeParser("a > 1 and b == 'foo'))

List all records with "balance" between 20 and 30

> db.demo.find({"balance" : {"$gte" : 20, "$lte" : 30000}})
{ "_id" : ObjectId("63a44653246733a1f7983512"), "number" : "1000-3", "currency" : "EUR", "balance" : 29204.58508321875 }
{ "_id" : ObjectId("63a44653246733a1f7983515"), "number" : "1000-6", "currency" : "EUR", "balance" : 22560.731939550726 }
{ "_id" : ObjectId("63a44653246733a1f7983518"), "number" : "1000-9", "currency" : "EUR", "balance" : 19287.221738817094 }
{ "_id" : ObjectId("63a44653246733a1f798351b"), "number" : "1000-12", "currency" : "EUR", "balance" : 18988.781982718338 }
{ "_id" : ObjectId("63a44653246733a1f798351e"), "number" : "1000-15", "currency" : "EUR", "balance" : 13995.586306718444 }
{ "_id" : ObjectId("63a44653246733a1f7983520"), "number" : "1000-17", "currency" : "EUR", "balance" : 3746.23286360557 }
{ "_id" : ObjectId("63a44653246733a1f7983521"), "number" : "1000-18", "currency" : "EUR", "balance" : 22901.02314663751 }
{ "_id" : ObjectId("63a44653246733a1f7983526"), "number" : "1000-23", "currency" : "EUR", "balance" : 10354.78275290249 }
{ "_id" : ObjectId("63a44653246733a1f7983529"), "number" : "1000-26", "currency" : "EUR", "balance" : 1564.2116820206775 }
{ "_id" : ObjectId("63a44653246733a1f798352c"), "number" : "1000-29", "currency" : "EUR", "balance" : 22536.764836061317 }
{ "_id" : ObjectId("63a44653246733a1f798352f"), "number" : "1000-32", "currency" : "EUR", "balance" : 14256.549360591542 }
{ "_id" : ObjectId("63a44653246733a1f7983530"), "number" : "1000-33", "currency" : "EUR", "balance" : 15311.191319516327 }
{ "_id" : ObjectId("63a44653246733a1f7983531"), "number" : "1000-34", "currency" : "EUR", "balance" : 9059.96909712794 }
{ "_id" : ObjectId("63a44653246733a1f7983536"), "number" : "1000-39", "currency" : "EUR", "balance" : 15361.839827309375 }
{ "_id" : ObjectId("63a44653246733a1f798353a"), "number" : "1000-43", "currency" : "EUR", "balance" : 15841.774041872513 }
{ "_id" : ObjectId("63a44653246733a1f798353e"), "number" : "1000-47", "currency" : "EUR", "balance" : 19930.301421625718 }
{ "_id" : ObjectId("63a44653246733a1f7983540"), "number" : "1000-49", "currency" : "EUR", "balance" : 6827.21884127514 }
{ "_id" : ObjectId("63a44653246733a1f7983545"), "number" : "1000-54", "currency" : "EUR", "balance" : 28524.656601429833 }
{ "_id" : ObjectId("63a44653246733a1f7983546"), "number" : "1000-55", "currency" : "EUR", "balance" : 21508.737268560086 }
{ "_id" : ObjectId("63a44653246733a1f7983548"), "number" : "1000-57", "currency" : "EUR", "balance" : 18177.16463380504 }
Type "it" for more
>

Converting SQL to MongoDB query:

1. Online sites

2. NoSQLBooster for MongoDB

Click the "SQL" icon (left to the "Run" icon) to open a new SQL query tab:

⚠️ **GitHub.com Fallback** ⚠️