Lesson 04 Getting Started with MongoDB - adparker/GADSLA_1403 GitHub Wiki
Instructions for Installation of MongoDB. I really just want the MongoDB client, but I'm just going to install the whole thing (server and client), since I don't see an easy way to just get the MongoDB client.
The whole package is about 300MB. This is what I did on my Mac:
$ brew install mongodb
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/mongodb-2.4.9.mavericks.bottle.2.tar.gz
######################################################################## 100.0%
==> Pouring mongodb-2.4.9.mavericks.bottle.2.tar.gz
==> Caveats
To have launchd start mongodb at login:
ln -sfv /usr/local/opt/mongodb/*.plist ~/Library/LaunchAgents
Then to load mongodb now:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mongodb.plist
Or, if you don't want/need launchctl, you can just run:
mongod --config /usr/local/etc/mongod.conf
==> Summary
🍺 /usr/local/Cellar/mongodb/2.4.9: 391 files, 302M
$
Remember, I'm not interested in running the server. I just want the client program, so you can ignore the bit about launchd and launchctl. Those are instructions for the server. If you want to run the server locally, you can. For fun, I'm going to instead use a free, managed MongoDB service as my server.
Launching a Free Managed MongoDB Server on MongoHQ
Sign up at MongoDB and choose the "Sandbox" option if you don't want to run the server locally. But you still need to install MongoDB to get access to the client software.
From the shell, you can use the mongo
client to connect to your mongodb server. In general, the command to run is of the form (where you replace the UPPERCASE words with the correct information for you):
$ mongo SERVER_NAME:PORT/COLLECTION -u USER -p PASSWORD
You can connect to my free instance here (username and password will be given in class). Run this command from your shell:
$ mongo oceanic.mongohq.com:10065/ds -u USER -p PASSWORD
MongoDB shell version: 2.4.9
connecting to: oceanic.mongohq.com:10065/ds
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
>
The tutorial at try.mongodb.org seems broken. I'll recreate it here.
Log back in to the MongoDB server:
$ mongo oceanic.mongohq.com:10065/ds -u XXXX -p XXXX
MongoDB shell version: 2.4.9
connecting to: oceanic.mongohq.com:10065/ds
>
Now you're in the MongoDB client. You can look up some help information:
> help
db.help() help on db methods
db.mycoll.help() help on collection methods
sh.help() sharding helpers
rs.help() replica set helpers
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce
show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory, 'global' is default
use <db_name> set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : 1 } ) list objects in foo where a == 1
it result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell
The MongoDB shell is a (limited) javascript interpreter, so any commands you are familiar with from javascript should work here. Try this out:
var a = 5;
a * 10;
for(i=0; i<10; i++) { print('hello'); };
MongoDB is a document database. This means that we store data as documents, which are similar to JavaScript objects. Here below are a few sample JS objects:
var a = {age: 25};
var n = {name: 'Ed', languages: ['c', 'ruby', 'js']};
var student = {name: 'Jim', scores: [75, 99, 87.2]};
Go ahead and create some documents. Here's how you save a document to MongoDB:
db.scores.save({a: 99});
This says, "save the document '{a: 99}' to the 'scores' collection." Unlike in Python, you don't have to put the key in quotes. The a
above is without quotes, since all keys are assumed to be a string anyways. To confirm that it's been saved properly:
db.scores.find();
Try adding some documents to the scores collection:
for(i=0; i<20; i++) { db.scores.save({a: i, exam: 5}) };
Try that, then enter
db.scores.find();
to see if the save succeeded. Since the shell only displays page of results at time, you'll need to enter the it
command to iterate over the rest.
You've already tried a few queries, but let's make them more specific.
Let's find all documents where a == 2
:
db.scores.find({a: 2});
Or we could find all documents where a > 15:
db.scores.find({a: {'$gt': 15}})
$gt
is one of many special query operators. Here are few others:
{a: {$lt: 5}} // Less Than
{a: {$gte: 10}} // Greater than or equal to
{a: {$ne: 'b'}} // Not Equal To
{a: {$in: ['a', 'b', 'c']}} // Exists in array
Try out a few queries, before moving onto the next step.
Now create a couple documents like these for updating:
db.users.insert({name: 'Johnny', languages: ['ruby', 'c']});
db.users.insert({name: 'Sue', languages: ['scala', 'lisp']});
Confirm they were saved - with our favorite:
db.users.find()
Update Johnny's name and languages:
db.users.update({name: 'Johnny'}, {name: 'Cash', languages: ['English']});
Use our favorite find query to inspect the resulting documents. Notice that the array update overwrote Johnny's languages! Play with some more updates, before continuing on.
Update has the sometimes unexpected behavior of replacing the entire document. However, we can use update operators to only modify parts of our documents. Update Sue's languages without overwriting them:
db.users.update({name: 'Sue'}, { $addToSet: {languages: 'ruby'}});
Or we can add a new field to Cash
db.users.update({name: 'Cash'}, {'$set': {'age': 50} });
You can also push and pull items from arrays:
db.users.update({name: 'Sue'}, {'$push': {'languages': 'ruby'} });
db.users.update({name: 'Sue'}, {'$pull': {'languages': 'scala'} });
Give these a try and check the results.
To delete matching documents only, add a query selector to the remove method:
db.users.remove({name: 'Sue'});
To delete everything from a collection:
db.scores.remove();
Hit control-d
or control-c
to exit out of the mongo client program.
There's official documentation on importing files into MongoDB. One can import CSV, TSV, JSON, and other formats. In this example, I'm going to import movies.small.json
, which looks like this, a list of objects:
[
{"helpful": 7,
"helpfulness": 1.0,
"productId": "B003AI2VGA",
"profileName": "Brian E. Erland \"Rainbow Sphinx\"",
"score": 3.0,
"summary": "\"There Is So Much Darkness Now ~ Come For The Miracle\"",
"text": "Synopsis: On the daily trek from Juarez, Mexico to El Paso, Texas an ...",
"time": "2007-06-24T17:00:00",
"total": 7,
"userId": "A141HP4LYPWMSR"},
{ ... },
{ ... }
]
From the Official JSON page:
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.
A JSON string basically looks like a Python data structure, although it is slightly more restricted. For instance, object (or dictionary) keys can only strings, where as in Python, they can be integers, tuples of strings, etc.
Now we need to download the movies.small.json
file. This JSON file contains all the columns, and it has a header as the first line.
-
movies.small.json
curl -L 'https://github.com/adparker/GADSLA_1403/blob/master/src/lesson04/movies.small.json?raw=true' -o movies.small.json
Next, import (upload) the JSON file to MongoDB. You have to specify the HOST, the DB, and the COLLECTION. If the COLLECTION doesn't exist in the DB already, it'll be created when you do the import. But the DB needs to exist.
$ mongoimport --host HOST --port PORT --username USERNAME --password PASSWORD --collection COLLECTION --db DB --jsonArray --file movies.small.json
If you're going to upload to my server, please change the COLLECTION to your name. Don't use reviews
as your collection.
This is what I did to upload to my hosted MongoDB instance at MongoHQ, to the DB called ds
, and the collection called reviews
. Run this from your command line shell:
(username and password will be given in class)
$ mongoimport --host oceanic.mongohq.com --port 10065 --username XXXX --password XXXX --collection reviews --db ds --jsonArray --file movies.small.json
connected to: oceanic.mongohq.com:10065
Thu Mar 13 01:57:47.203 Progress: 42582912/6653067 640%
Thu Mar 13 01:57:47.204 300 100/second
Thu Mar 13 01:57:49.495 check 9 5554
Thu Mar 13 01:57:49.644 imported 5554 objects
$
Now log in to the MongoDB server and take a look around.
$ mongo oceanic.mongohq.com:10065/ds -u XXXX -p XXXX
MongoDB shell version: 2.4.9
connecting to: oceanic.mongohq.com:10065/ds
>
What collections are on this DB?
> show collections
reviews
scores
system.indexes
system.users
users
How many documents in the reviews
collection?
> db.reviews.count()
5554
Look at one of the records:
> db.reviews.findOne()
{
"_id" : ObjectId("53223a5a1f3a633f4ccdff1d"),
"helpful" : 7,
"text" : "Synopsis: On the daily trek from ... ",
"userId" : "A141HP4LYPWMSR",
"summary" : "\"There Is So Much Darkness Now ~ Come For The Miracle\"",
"score" : 3,
"helpfulness" : 1,
"time" : "2007-06-24T17:00:00",
"profileName" : "Brian E. Erland \"Rainbow Sphinx\"",
"total" : 7,
"productId" : "B003AI2VGA"
}
What are the distinct productIds?
> db.reviews.distinct("productId")
[
"B003AI2VGA",
"B00006HAXW",
"B00004CQT3",
"B00004CQT4",
...
]
How to get the count? For some reason, you can't call count()
. You have to use .length
, which is a javascript thing.
> db.reviews.distinct("productId").length;
156
Install the MongoDB Driver for Python
This is what I did on my Mac. I used pip
to install the pymongo
packages. pip
is a popular Python package manager. Hopefully you already have pip
installed on your computer.
$ pip install pymongo
Downloading/unpacking pymongo
Downloading pymongo-2.6.3.tar.gz (324kB): 324kB downloaded
Running setup.py (path:/private/var/folders/ny/qz6m17nd08d51p40f5x8xsx40000gn/T/pip_build_andrew/pymongo/setup.py) egg_info for package pymongo
<skip a bunch of messages>
Successfully installed pymongo
Cleaning up...
...
The following is a subset of the Official Pymongo Tutorial.
Get into your Python shell (or Ipython if you have it).
$ python
Python 2.7.6 |Anaconda 1.9.0 (x86_64)| (default, Jan 10 2014, 11:23:15)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
The >>>
indicates you're in the Python shell. No import the pymongo library.
>>> import pymongo
If that worked, then we're good to go. Time to create a client. You can connect to my MongoDB instance like this, where you substitute USER and PASSWORD for the real values:
>>> client = pymongo.MongoClient('mongodb://USER:[email protected]:10065/ds')
>>> type(client)
<class 'pymongo.mongo_client.MongoClient'>
A MongoDB server contains several databases. Get a handle on the database called ds
:
>>> ds = client.ds
>>> type(ds)
<class 'pymongo.database.Database'>
A database has a number of collections.
>>> ds.collection_names()
[u'system.indexes', u'system.users', u'scores', u'users', u'reviews']
Get a handle on the reviews
collection.
>>> reviews = ds.reviews
>>> type(reviews)
<class 'pymongo.collection.Collection'>
Finally, let's find one document where the score is 4.
>>> onedoc = reviews.find_one({'score': 4)
>>> type(onedoc)
<type 'dict'>
>>> import pprint
>>> pprint.pprint(onedoc)
{u'_id': ObjectId('53223a5c1f3a633f4ccdff27'),
u'helpful': 22,
u'helpfulness': 0.9565217391304348,
u'productId': u'B00006HAXW',
u'profileName': u'Henrique Peirano',
u'score': 4.0,
u'summary': u'I expected more.',
u'text': u"I have the Doo Wop 50 and 51 DVDs, and ...",
u'time': u'2002-12-10T16:00:00',
u'total': 23,
u'userId': u'A2TX99AZKDK0V7'}
This is a common problem people run into. We can retrieve by ObjectID, but we have to make sure we use an ObjectID instance and not the string representation of the ObjectID.
>>> objectid = onedoc['_id']
>>> type(objectid)
<class 'bson.objectid.ObjectId'>
>>> objectid
ObjectId('53223a5c1f3a633f4ccdff27')
>>> str(objectid)
'53223a5c1f3a633f4ccdff27'
>>> str_of_objectid = str(objectid)
Let's try to find document again by ObjectID:
>>> reviews.find_one({"_id": objectid}) ## This works as expected
{u'total': 23, u'helpful': 22, u'text': u"I have the Doo Wop 50 and 51 DVDs, and was anxiously waiting for Rock, Rhythm and Doo Wop to arrive. From the first video, which featured the crème de ... }
>>>
>>> reviews.find_one({"_id": '53223a5c1f3a633f4ccdff27'}) ## Returns nothing
>>>
>>> myobjectid = ObjectId('53223a5c1f3a633f4ccdff27') ## Try again, but create an ObjectID instance.
>>>
>>> reviews.find_one({"_id": myobjectid}) ## This works!
{u'total': 23, u'helpful': 22, u'text': u"I have the Doo Wop 50 and 51 DVDs, and was anxiously waiti ... }
You can iterate over multiple documents easily:
>>> for review in reviews.find({'productId': 'B00006HAXW'}): print review['score']
...
5.0
5.0
5.0
4.0
5.0
5.0
5.0
4.0
5.0
5.0
5.0
5.0
5.0
>>>
Writing to MongoDB is super easy:
>>> new_reviews = [{'userId': '123', 'profileName': 'Andrew', 'productId': 'ABC', 'summary': 'Hello'}, {'userId': '543', 'profileName': 'Susan', 'productId': 'XYZ', 'summary': 'Blah'}]
>>> reviews.insert(new_reviews)
[ObjectId('53224fe303fec10f418e6d21'), ObjectId('53224fe303fec10f418e6d22')]
Of course, you can perform all of the complex queries that were available to you in the MongoDB client. Now you have access to a scalable, persistent, fast data store that meshes nicely with Python!