Lesson 03 MySQL 5 Tutorial 01 - adparker/GADSLA_1403 GitHub Wiki
Make sure brew is up to date and install mysql:
$ brew update
$ brew doctor
$ brew upgrade
$ brew install mysql
Connect to the MySQL server:
$ mysql --verbose --user=yellowpencil1 --password=XXXX --host=datascience.c4bsfxlekv2v.us-west-1.rds.amazonaws.com mydb
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 31
Server version: 5.6.13-log MySQL Community Server (GPL)
Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Reading history-file /Users/andrew/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
And it shows:
mysql> show databases;
--------------
show databases
--------------
+--------------------+
| Database |
+--------------------+
| information_schema |
| innodb |
| mydb |
| mysql |
| performance_schema |
+--------------------+
5 rows in set (0.03 sec)
mysql>
Create your own database.
mysql> create database adparker;
--------------
create database adparker
--------------
Query OK, 1 row affected (0.04 sec)
mysql> show databases;
--------------
show databases
--------------
+--------------------+
| Database |
+--------------------+
| information_schema |
| adparker |
| innodb |
| mydb |
| mysql |
| performance_schema |
+--------------------+
6 rows in set (0.02 sec)
If you make a mistake and need to delete the database (very dangerous!!!) do this:
mysql> drop database adparker # Please do not actually run this!
Switch to using that database.
mysql> use adparker;
Database changed
A database contains a set of tables. We don't have any yet.
mysql> show tables;
--------------
show tables
--------------
Empty set (0.03 sec)
Remember the file movies.small.txt
? The records look like this:
product/productId: B003AI2VGA
review/userId: A141HP4LYPWMSR
review/profileName: Brian E. Erland "Rainbow Sphinx"
review/helpfulness: 7/7
review/score: 3.0
review/time: 1182729600
review/summary: "There Is So Much Darkness Now ~ Come For The Miracle"
review/text: Synopsis: On the daily trek from Juarez, Mexico to El Paso, Texas an ever
Let's put this into normal form. What tables might we create?
...
The productId is the primary key. It would be nice to have the product title and other info, but that's not given to us.
Product Table |
---|
productId |
The User Table has a little more info.
User Table |
---|
userId |
profileName |
The Review Table has the foreign keys into User and Product Tables.
Review Table |
---|
reviewId |
userId |
productId |
helpfulness |
score |
time |
summary |
text |
CREATE TABLE `products` (
`productID` char(10) NOT NULL,
PRIMARY KEY (`productID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `users` (
`userId` varchar(20) NOT NULL,
`profileName` varchar(255) NOT NULL,
PRIMARY KEY (`userID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `reviews` (
`reviewId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`userId` char(14) NOT NULL,
`productId` char(10) NOT NULL,
`helpfulness` float DEFAULT NULL,
`helpful` int(3) DEFAULT NULL,
`total` int(3) DEFAULT NULL,
`score` int(1) DEFAULT NULL,
`ts` timestamp DEFAULT 0,
`summary` varchar(255) DEFAULT NULL,
`text` text DEFAULT NULL,
PRIMARY KEY (`reviewId`),
UNIQUE KEY `user_product` (`userId`, `productId`),
KEY `productId` (`productId`),
FOREIGN KEY (`productId`) REFERENCES `products`(`productId`),
FOREIGN KEY (`userId`) REFERENCES `users`(`userId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
If you make a mistake, and want to drop a table called <table_name>, then do this:
drop table <table_name>
Show the tables in our DB!
mysql> show tables;
--------------
show tables
--------------
+--------------------+
| Tables_in_adparker |
+--------------------+
| products |
| reviews |
| users |
+--------------------+
3 rows in set (0.03 sec)
First, let's create the TSV files:
$ python make_tsv.py productId | sort | uniq > products.tsv
$ python make_tsv.py userId profileName | sort | uniq > users.tsv
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `A2UM50F7MUAT9P\t"¤*\260Ms JuiCy\256\260*¤ ""H. A. M"""\r' and `A2FS6OGMZMALTD\tMojo Jojo\r'.
# Try again
$ export LC_ALL='C'
$ python make_tsv.py userId profileName | sort | uniq > users.tsv
$ python make_tsv.py userId productId helpfulness helpful total score ts summary text > reviews.tsv
If you have a TSV file, you can easily populate the tables from the MySQL shell like this: NOTE You have to be running this in the same directory that contains the TSV file that you want to use:
mysql> LOAD DATA LOCAL INFILE 'products.tsv'
INTO TABLE products
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n';
--------------
LOAD DATA LOCAL INFILE 'products.tsv'
INTO TABLE products
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n'
--------------
Query OK, 156 rows affected (0.07 sec)
Records: 156 Deleted: 0 Skipped: 0 Warnings: 0
mysql>
If something messes up, you can just delete all items from a table, without deleting a table, like this:
mysql> delete from products;
Now load the users:
LOAD DATA LOCAL INFILE 'users.tsv'
INTO TABLE users
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
(userId, profileName);
Finally, load the reviews:
LOAD DATA LOCAL INFILE 'reviews.tsv'
INTO TABLE reviews
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
(userId, productId, helpfulness, helpful, total, score, ts, summary, text);
--------------
LOAD DATA LOCAL INFILE 'reviews.tsv'
INTO TABLE reviews
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
(userId, productId, helpfulness, helpful, total, score, ts, summary, text)
--------------
Query OK, 5507 rows affected (2.83 sec)
Records: 5554 Deleted: 0 Skipped: 47 Warnings: 0
Not sure why 47 reviews were skipped! Badly encoded characters?
Let's take at a couple of the ones that made it:
mysql> select * from reviews limit 1000,2 \G
--------------
select * from reviews limit 1000,2
--------------
*************************** 1. row ***************************
reviewId: 1001
userId: A1YQ6QB2127AJ4
productId: B0001G6PZC
helpfulness: 1
helpful: 7
total: 7
score: 5
ts: 2004-03-23 16:00:00
summary: Tom Cruise Triumphs Again In This Great Movie
text: "The Last Samarai" is one of the greatest films released in 2003, starring Tom Cruise and Ken Watanabe. Its hardhitting impact is intense enough to keep audiences' eyes wide open through the whole movie. Its written plot of a US soldier changing his life for the better in Japan is brilliant. Despite the war themes, there are some great heartwarming scenes, namely when the US soldier starts healing his pains from his US war battles. As he begins learning Japanese traditions, he begins learning more about himself, which plays a crutial part in the film. As this develops, the drama also develops, which keeps audiences interested. This allows the conflicts between the two sides to be brought out as brilliantly as it was. His friends' life stories add an additional unique flavor. The war scenes are uniquely intense. Few other movies have thought of such original war scenery. This teaches audiences that there is no one way to fight a war or to solve a conflict. <p>Its Oscar-nominated 1870's Japanese setting is flawless. They prove that they did their research to construct such an elaborate set. The war scenes are desplicted as brutal as they should thanks to the special effects. The costumes are accurate from those times. Such great background offers an improved sense of this masterpiece film.<p>Tom Cruise's underrated role as US soldier Nathan Algren is his best in years. His two years of training to prepare for his role paid off. He never loses his intense sense of emotion for a second. The pain and agony expressed are as pure as if he actually lived it. Ken Watanabe's Oscar-nominated role as Samarai leader Katsumoto rightfully places him in the spotlight. This performance makes acting appear easy when it's actually the opposite. He proves that he will become a big-name actor in the next following years. All other actors also performed their roles beautifully.<p>"The Last Samarai" is great for those looking for a unique war movie with emotion. This is sure to please many audiences. This is destined to be a classic in the following years as it deserves.
*************************** 2. row ***************************
reviewId: 1002
userId: A3MJMDJ8N9S8Q8
productId: B0001G6PZC
helpfulness: 1
helpful: 7
total: 7
score: 5
ts: 2003-12-26 16:00:00
summary: Dances with Wolves, but not in a "slap in the face" way
text: �The path to enlightenment is littered with the bodies of the ignorant.�<p>This movie was simply incredible. The acting second to none, the story moving and flawless, the characters deep and passionate, a must see, again and again.<p>Hired to train the army of a Japan drowning in western influence, Nathan Algren (Tom Cruise), becomes captured by a band of �savage� samurai warriors led by Katsumoto (Ken Watanabe). In the months to come Algren embraces the code of the samurai and befriending his �savage� enemies to soon discover there is nothing savage about honor, duty, and respect.
2 rows in set (0.02 sec)
For fun, we can do a join query:
mysql> select * from users join reviews using(userId) where profileName = 'scifi'\G
--------------
select * from users join reviews using(userId) where profileName = 'scifi'
--------------
*************************** 1. row ***************************
userId: A100Y8WSLFJN7Q
profileName: scifi
reviewId: 128
productId: B000063W1R
helpfulness: 0
helpful: 0
total: 1
score: 5
ts: 2012-08-17 17:00:00
summary: Entertaining
text: We were highly entertained. It is a very well made movie with good acting for the most part. Can't go wrong with it.
1 row in set (0.02 sec)
Trying to delete the user results in an error because of the foreign key constraint:
mysql> delete from users where profileName = 'scifi';
--------------
delete from users where profileName = 'scifi'
--------------
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`adparker`.`reviews`, CONSTRAINT `reviews_ibfk_2` FOREIGN KEY (`userId`) REFERENCES `users` (`userId`))
Choose "Platform Independent" -> "TAR" -> "Save File". http://dev.mysql.com/downloads/connector/python/#downloads
http://dev.mysql.com/doc/connector-python/en/connector-python-installation-source-unix.html
After downloading this TAR file, this is what I did:
host-93:data_science andrew$ cd ~/Downloads/
host-93:Downloads andrew$ gunzip mysql-connector-python-1.1.6.tar.gz
host-93:Downloads andrew$ tar xf mysql-connector-python-1.1.6.tar
host-93:Downloads andrew$ cd mysql-connector-python-1.1.6
host-93:mysql-connector-python-1.1.6 andrew$ sudo python setup.py install
Password:
running install
running build
running build_py
...
Let's do a simple query:
>>> import datetime
>>> import mysql.connector
>>> cnx = mysql.connector.connect(user='XXXX', password='XXXX', host='XXXX', database='adparker')
>>> cursor = cnx.cursor()
>>> query = "select count(1) from reviews"
>>> query = "select count(1) from reviews"
>>> cursor.execute(query)
>>> list(cursor)
[(5507,)]
Continuing, with another query:
>>> import pprint
>>> query = "select profileName, productId, score, summary from users join reviews using(userId) where summary like %s and score >= %s limit 3"
>>> cursor.execute(query, ['%great%', 4])
>>> pprint.pprint(list_of_tups)
[(u'Adriana', u'B003QWVPT4', 5, u'Great movie..came quick!'),
(u'GoonMoon', u'B001GBPZRU', 4, u'Great fun B-movie'),
(u'Chris Jordan', u'B000063W82', 5, u'Great DVD!')]