MongoDB - shaysalomon12/Data-Engineer GitHub Wiki
This will usually be at:
cat /etc/mongod.conf
mongodb+srv://<username>:<password>@<host_ip_address>:27017/?authMechanism=DEFAULT&authSource=<default_database>
Inside MongoDB configuration file (/etc/mongod.conf) change the value of dbpath to the new location:
dbpath = /usr/local/var/mongodb
for (let i = 1; i <= 100; ++i) {
db.demo.insertOne({
"number": "1000-" + i,
"currency": i > 500000 ? "USD" : "EUR",
"balance": Math.random() * 100000
})
}
> show dbs
db1 0.019GB
db2 71.214GB
... 0.000GB
local 0.000GB
test 0.007GB
db.currentOp()
Look at:
- "opid" --> the process_id
- "secs_running" --> seconds this process is running
- "microsecs_running" --> NumberLong()
Any process running more than 100ms is considered a slow running query
db.killOp(<opid>)
mongotop
mongostat
- Verify current profiling status
> db.getProfilingStatus()
-
There are 3 profiling levels: 0 - Disabled 1 - Enabled (100ms) 2 - Enabled for everything
-
Set as Profiling level for database CETServices to write to log commands with execution time of 500msec and above:
> use CETServices
switched to db CETServices
> db.setProfilingLevel(0, { slowms: 500 })
{ was: 0, slowms: 500, sampleRate: 1, ok: 1 }
To verify the settings:
> db.getProfilingStatus()
{ was: 0, slowms: 500, sampleRate: 1, ok: 1 }
To find long queries that the profiler is monitoring:
> db.system.profile.find()
- Start monitoring all operations (level=2) with filter for a specific collection (not sure such filter works)
-- Set Monitoring level=2 (all operations) for current database:
db.setProfilingLevel( 2, { filter: { op: "query", ns : 'test.demo' } } )
-- Display all operations for collection 'test.demo':
db.system.profile.find( { ns : 'test.demo' } ).pretty()
- For example, let's create a collection named foo:
> db.foo.insert({x:123})
-- Now let's retreive tha data:
> db.foo.find({x:123})
-- Let's run .explain() on the query:
> db.foo.find({x:123}).explain()
-- We got some result. LEt's see the value of:
"winningPlan" : {
"stage" : "COLLSCAN",
It says "stage" : "COLLSCAN" - this means a collection scan. Like a table full scan.
-- Let's create an index on "x":
> db.foo.creteIndex({x:1})
-- And now, when we run .explain() we see it is using the index:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"x" : 1
},
It says "stage" : "IXSCAN" - this means using index.
-- Try to avoid (full) scan + sort - this is a time consumer query.
We can see the the explain results:
"winningPlan" - the chosen plan
"rejectedPlans" - candidate plans that were rejected
https://rockset.com/blog/handling-slow-queries-in-mongodb-part-1-investigation/
Searching for errors inside log file (bash):
cat /var/log/mongodb/mongod.log | grep '** ERROR:'
-- Or
cat /var/log/mongodb/mongod.log | /run/mtools/jq
- Explaining Log Messages codes:
- https://www.mongodb.com/docs/manual/reference/log-messages/
At the bottom of the page of the following link (Log Parsing Examples) you can see
- examples of querying the log:
- https://www.mongodb.com/docs/manual/reference/log-messages/#std-label-log-messages-configure-verbosity
Examples (bash):
-- Find all error messages in JSON attribute ".attr.error.errmsg" that are "Connection reset by peer":
# /run/mtools/jq '. | select(.attr.error.errmsg=="Connection reset by peer")' /var/log/mongodb/mongod.log.2022-12-23T00-00-01
-- Count all types of messages in JSON attribute ".attr.error.errmsg":
/run/mtools/jq -cr '.attr.error.errmsg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn
-- Count all types of messages in JSON attribute ".msg":
# /run/mtools/jq -cr '.msg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn
-- Show Slow query: messages with .msg = "Slow query"
/run/mtools/jq '. | select(.msg=="Slow query")' /var/log/mongodb/mongod.log.2022-12-23T00-00-01
-- Show Slow operations that took more than 2000 milliseconds:
/run/mtools/jq '. | select(.attr.durationMillis>=2000)' /var/log/mongodb/mongod.log.2022-12-23T00-00-01
-- List all clients (.attr.client) that got msg=="Successful authentication" grouped and sorted
# /run/mtools/jq '. | select(.msg=="Successful authentication") | .attr.client' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn
-- List all clients (.attr.client) that got msg=="Successful authentication" between range of hours grouped and sorted
# /run/mtools/jq '. | select(.msg=="Successful authentication") | .attr.client' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn
-- Same as before, but this time extract only the IP without the port (172.17.31.94:56775):
# /run/mtools/jq '. | select(.t["$date"] >= "2023-05-11T16:30:00.000" and .t["$date"] <= "2023-05-11T07:00:00.000" and .msg=="Successful authentication") | .attr.client | split(":")[0]' /var/log/mongodb/mongod.log | grep -v null | sort | uniq -c | sort -rn
-- List all queries with duration longer than 20000ms and between range of hours
/run/mtools/jq '. | select(.t["$date"] >= "2023-02-21T17:30:00.000" and .t["$date"] <= "2023-02-21T18:30:00.000" and .attr.durationMillis>=20000)' /var/log/mongodb/mongod.log > slow_queries_over_200ms_20230222.txt
/run/mtools/jq '. | select(.t["$date"] >= "2023-02-21T17:30:00.000" and .t["$date"] <= "2023-02-21T18:30:00.000" and .attr.durationMillis>=20000 and .c != "COMMAND")' /var/log/mongodb/mongod.log > slow_queries_over_2000ms_20230222.json
- More examples inside this link:
- https://www.mongodb.com/docs/manual/reference/log-messages/#std-label-log-messages-configure-verbosity
- Count all types of messages in JSON attribute ".msg":
# /run/mtools/jq -cr '.msg' /var/log/mongodb/mongod.log.2022-12-23T00-00-01 | grep -v null | sort | uniq -c | sort -rn
/run/mtools/jq '.[] | select(.msg=="Successful authentication") | .client' /var/log/mongodb/mongod.log.2022-12-23T00-00-01
from pymongo import MongoClient
import pandas as pd # Not sure Pandas can show in PyCharm
import pql # Python-Query-Language: translates python expressions to MongoDB queries.
# MongoDB connection string
myClient = MongoClient(
host='<server_ip_address>:27017',
serverSelectionTimeoutMS=3000,
username="<username>",
password="<password>"
)
# Select Database and Collection
myDB = myClient["<db_name>"]
myCollection = myDB["<collection_name>"]
# Query collection test.demo for number = '1000-10'
myQuery = {
"balance" : {
"$gte" : 20,
"$lte" : 30
}
}
# Run query (using sort and limit)
myDoc = myCollection.find(myQuery).sort("balance", -1).limit(5) # -1 descending
# Print using Pandas
# entries = list(myDoc)
# df = pd.DataFrame(entries)
# df.head
# Print the results to the console
for x in myDoc:
print(x)
# Show all Databases
# database_names = myClient.list_database_names()
# print ("\ndatabases:", database_names)
# PQL translates python expressions to MongoDB queries.
# print(pql.SchemaFreeParser("a > 1 and b == 'foo'))
> db.demo.find({"balance" : {"$gte" : 20, "$lte" : 30000}})
{ "_id" : ObjectId("63a44653246733a1f7983512"), "number" : "1000-3", "currency" : "EUR", "balance" : 29204.58508321875 }
{ "_id" : ObjectId("63a44653246733a1f7983515"), "number" : "1000-6", "currency" : "EUR", "balance" : 22560.731939550726 }
{ "_id" : ObjectId("63a44653246733a1f7983518"), "number" : "1000-9", "currency" : "EUR", "balance" : 19287.221738817094 }
{ "_id" : ObjectId("63a44653246733a1f798351b"), "number" : "1000-12", "currency" : "EUR", "balance" : 18988.781982718338 }
{ "_id" : ObjectId("63a44653246733a1f798351e"), "number" : "1000-15", "currency" : "EUR", "balance" : 13995.586306718444 }
{ "_id" : ObjectId("63a44653246733a1f7983520"), "number" : "1000-17", "currency" : "EUR", "balance" : 3746.23286360557 }
{ "_id" : ObjectId("63a44653246733a1f7983521"), "number" : "1000-18", "currency" : "EUR", "balance" : 22901.02314663751 }
{ "_id" : ObjectId("63a44653246733a1f7983526"), "number" : "1000-23", "currency" : "EUR", "balance" : 10354.78275290249 }
{ "_id" : ObjectId("63a44653246733a1f7983529"), "number" : "1000-26", "currency" : "EUR", "balance" : 1564.2116820206775 }
{ "_id" : ObjectId("63a44653246733a1f798352c"), "number" : "1000-29", "currency" : "EUR", "balance" : 22536.764836061317 }
{ "_id" : ObjectId("63a44653246733a1f798352f"), "number" : "1000-32", "currency" : "EUR", "balance" : 14256.549360591542 }
{ "_id" : ObjectId("63a44653246733a1f7983530"), "number" : "1000-33", "currency" : "EUR", "balance" : 15311.191319516327 }
{ "_id" : ObjectId("63a44653246733a1f7983531"), "number" : "1000-34", "currency" : "EUR", "balance" : 9059.96909712794 }
{ "_id" : ObjectId("63a44653246733a1f7983536"), "number" : "1000-39", "currency" : "EUR", "balance" : 15361.839827309375 }
{ "_id" : ObjectId("63a44653246733a1f798353a"), "number" : "1000-43", "currency" : "EUR", "balance" : 15841.774041872513 }
{ "_id" : ObjectId("63a44653246733a1f798353e"), "number" : "1000-47", "currency" : "EUR", "balance" : 19930.301421625718 }
{ "_id" : ObjectId("63a44653246733a1f7983540"), "number" : "1000-49", "currency" : "EUR", "balance" : 6827.21884127514 }
{ "_id" : ObjectId("63a44653246733a1f7983545"), "number" : "1000-54", "currency" : "EUR", "balance" : 28524.656601429833 }
{ "_id" : ObjectId("63a44653246733a1f7983546"), "number" : "1000-55", "currency" : "EUR", "balance" : 21508.737268560086 }
{ "_id" : ObjectId("63a44653246733a1f7983548"), "number" : "1000-57", "currency" : "EUR", "balance" : 18177.16463380504 }
Type "it" for more
>
Click the "SQL" icon (left to the "Run" icon) to open a new SQL query tab: