Lesson 03 MySQL 5 Tutorial 01 - adparker/GADSLA_1403 GitHub Wiki

Make sure brew is up to date and install mysql:

$ brew update
$ brew doctor
$ brew upgrade
$ brew install mysql

Connect to the MySQL server:

$ mysql --verbose --user=yellowpencil1 --password=XXXX --host=datascience.c4bsfxlekv2v.us-west-1.rds.amazonaws.com mydb
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 31
Server version: 5.6.13-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /Users/andrew/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

And it shows:

mysql> show databases;
--------------
show databases
--------------

+--------------------+
| Database           |
+--------------------+
| information_schema |
| innodb             |
| mydb               |
| mysql              |
| performance_schema |
+--------------------+
5 rows in set (0.03 sec)

mysql>

Create your own database.

mysql> create database adparker;
--------------
create database adparker
--------------

Query OK, 1 row affected (0.04 sec)
mysql> show databases;
--------------
show databases
--------------

+--------------------+
| Database           |
+--------------------+
| information_schema |
| adparker           |
| innodb             |
| mydb               |
| mysql              |
| performance_schema |
+--------------------+
6 rows in set (0.02 sec)

If you make a mistake and need to delete the database (very dangerous!!!) do this:

mysql> drop database adparker # Please do not actually run this!

Switch to using that database.

mysql> use adparker;
Database changed

A database contains a set of tables. We don't have any yet.

mysql> show tables;
--------------
show tables
--------------

Empty set (0.03 sec)

Normal Form

Remember the file movies.small.txt? The records look like this:

product/productId: B003AI2VGA
review/userId: A141HP4LYPWMSR
review/profileName: Brian E. Erland "Rainbow Sphinx"
review/helpfulness: 7/7
review/score: 3.0
review/time: 1182729600
review/summary: "There Is So Much Darkness Now ~ Come For The Miracle"
review/text: Synopsis: On the daily trek from Juarez, Mexico to El Paso, Texas an ever

Let's put this into normal form. What tables might we create?

...

The productId is the primary key. It would be nice to have the product title and other info, but that's not given to us.

Product Table
productId

The User Table has a little more info.

User Table
userId
profileName

The Review Table has the foreign keys into User and Product Tables.

Review Table
reviewId
userId
productId
helpfulness
score
time
summary
text

Let's create the SQL tables!

CREATE TABLE `products` (
`productID` char(10) NOT NULL,
PRIMARY KEY (`productID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `users` (
`userId` varchar(20) NOT NULL,
`profileName` varchar(255) NOT NULL,
PRIMARY KEY (`userID`)
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `reviews` (
`reviewId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`userId` char(14) NOT NULL,
`productId` char(10) NOT NULL,
`helpfulness` float DEFAULT NULL,
`helpful` int(3) DEFAULT NULL,
`total`	int(3) DEFAULT NULL,
`score`	int(1) DEFAULT NULL,
`ts` timestamp DEFAULT 0,
`summary` varchar(255) DEFAULT NULL,
`text` text DEFAULT NULL,
PRIMARY	KEY (`reviewId`),
UNIQUE KEY `user_product` (`userId`, `productId`),
KEY `productId`	(`productId`),
FOREIGN KEY (`productId`) REFERENCES `products`(`productId`),
FOREIGN KEY (`userId`) REFERENCES `users`(`userId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If you make a mistake, and want to drop a table called <table_name>, then do this:

drop table <table_name>

Show the tables in our DB!

mysql> show tables;
--------------
show tables
--------------

+--------------------+
| Tables_in_adparker |
+--------------------+
| products           |
| reviews            |
| users              |
+--------------------+
3 rows in set (0.03 sec)

Loading the Tables

First, let's create the TSV files:

$ python make_tsv.py productId | sort | uniq > products.tsv

$ python make_tsv.py userId profileName | sort | uniq > users.tsv
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `A2UM50F7MUAT9P\t"&#164;*\260Ms JuiCy\256\260*&#164; ""H. A. M"""\r' and `A2FS6OGMZMALTD\tMojo Jojo\r'.

# Try again
$ export LC_ALL='C'
$ python make_tsv.py userId profileName | sort | uniq > users.tsv

$ python make_tsv.py userId productId helpfulness helpful total score ts summary text > reviews.tsv

If you have a TSV file, you can easily populate the tables from the MySQL shell like this: NOTE You have to be running this in the same directory that contains the TSV file that you want to use:

mysql> LOAD DATA LOCAL INFILE 'products.tsv'
INTO TABLE products
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\r\n';

--------------
LOAD DATA LOCAL INFILE 'products.tsv'
INTO TABLE products
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n'
--------------

Query OK, 156 rows affected (0.07 sec)
Records: 156  Deleted: 0  Skipped: 0  Warnings: 0

mysql>

If something messes up, you can just delete all items from a table, without deleting a table, like this:

mysql> delete from products;

Now load the users:

LOAD DATA LOCAL INFILE 'users.tsv' 
INTO TABLE users 
FIELDS TERMINATED BY '\t' 
ESCAPED BY '\\'  
LINES TERMINATED BY '\r\n' 
(userId, profileName);

Finally, load the reviews:

LOAD DATA LOCAL INFILE 'reviews.tsv' 
INTO TABLE reviews 
FIELDS TERMINATED BY '\t' 
ESCAPED BY '\\'  
LINES TERMINATED BY '\r\n' 
(userId, productId, helpfulness, helpful, total, score, ts, summary, text);
--------------
LOAD DATA LOCAL INFILE 'reviews.tsv'
INTO TABLE reviews
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
(userId, productId, helpfulness, helpful, total, score, ts, summary, text)
--------------

Query OK, 5507 rows affected (2.83 sec)
Records: 5554  Deleted: 0  Skipped: 47  Warnings: 0

Not sure why 47 reviews were skipped! Badly encoded characters?

Let's take at a couple of the ones that made it:

mysql> select * from reviews limit 1000,2 \G
--------------
select * from reviews limit 1000,2
--------------

*************************** 1. row ***************************
   reviewId: 1001
     userId: A1YQ6QB2127AJ4
  productId: B0001G6PZC
helpfulness: 1
    helpful: 7
      total: 7
      score: 5
         ts: 2004-03-23 16:00:00
    summary: Tom Cruise Triumphs Again In This Great Movie
       text: "The Last Samarai" is one of the greatest films released in 2003, starring Tom Cruise and Ken Watanabe.  Its hardhitting impact is intense enough to keep audiences' eyes wide open through the whole movie.  Its written plot of a US soldier changing his life for the better in Japan is brilliant.  Despite the war themes, there are some great heartwarming scenes, namely when the US soldier starts healing his pains from his US war battles.  As he begins learning Japanese traditions, he begins learning more about himself, which plays a crutial part in the film.  As this develops, the drama also develops, which keeps audiences interested.  This allows the conflicts between the two sides to be brought out as brilliantly as it was.  His friends' life stories add an additional unique flavor.  The war scenes are uniquely intense.  Few other movies have thought of such original war scenery.  This teaches audiences that there is no one way to fight a war or to solve a conflict. <p>Its Oscar-nominated 1870's Japanese setting is flawless.  They prove that they did their research to construct such an elaborate set.  The war scenes are desplicted as brutal as they should thanks to the special effects.  The costumes are accurate from those times.  Such great background offers an improved sense of this masterpiece film.<p>Tom Cruise's underrated role as US soldier Nathan Algren is his best in years.  His two years of training to prepare for his role paid off.  He never loses his intense sense of emotion for a second.  The pain and agony expressed are as pure as if he actually lived it.  Ken Watanabe's Oscar-nominated role as Samarai leader Katsumoto rightfully places him in the spotlight.  This performance makes acting appear easy when it's actually the opposite.  He proves that he will become a big-name actor in the next following years.  All other actors also performed their roles beautifully.<p>"The Last Samarai" is great for those looking for a unique war movie with emotion.  This is sure to please many audiences.  This is destined to be a classic in the following years as it deserves.
*************************** 2. row ***************************
   reviewId: 1002
     userId: A3MJMDJ8N9S8Q8
  productId: B0001G6PZC
helpfulness: 1
    helpful: 7
      total: 7
      score: 5
         ts: 2003-12-26 16:00:00
    summary: Dances with Wolves, but not in a "slap in the face" way
       text: &#65533;The path to enlightenment is littered with the bodies of the ignorant.&#65533;<p>This movie was simply incredible.  The acting second to none, the story moving and flawless, the characters deep and passionate, a must see, again and again.<p>Hired to train the army of a Japan drowning in western influence, Nathan Algren (Tom Cruise), becomes captured by a band of &#65533;savage&#65533; samurai warriors led by Katsumoto (Ken Watanabe). In the months to come Algren embraces the code of the samurai and befriending his &#65533;savage&#65533; enemies to soon discover there is nothing savage about honor, duty, and respect.
2 rows in set (0.02 sec)

For fun, we can do a join query:

mysql> select * from users join reviews using(userId) where profileName = 'scifi'\G
--------------
select * from users join reviews using(userId) where profileName = 'scifi'
--------------

*************************** 1. row ***************************
     userId: A100Y8WSLFJN7Q
profileName: scifi
   reviewId: 128
  productId: B000063W1R
helpfulness: 0
    helpful: 0
      total: 1
      score: 5
         ts: 2012-08-17 17:00:00
    summary: Entertaining
       text: We were highly entertained. It is a very well made movie with good acting for the most part. Can't go wrong with it.
1 row in set (0.02 sec)

Trying to delete the user results in an error because of the foreign key constraint:

mysql> delete from users where profileName = 'scifi';
--------------
delete from users where profileName = 'scifi'
--------------

ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`adparker`.`reviews`, CONSTRAINT `reviews_ibfk_2` FOREIGN KEY (`userId`) REFERENCES `users` (`userId`))

Install Python MySQL Connector

Download from here:

Choose "Platform Independent" -> "TAR" -> "Save File". http://dev.mysql.com/downloads/connector/python/#downloads

Installation instructions here:

http://dev.mysql.com/doc/connector-python/en/connector-python-installation-source-unix.html

After downloading this TAR file, this is what I did:

host-93:data_science andrew$ cd ~/Downloads/
host-93:Downloads andrew$ gunzip mysql-connector-python-1.1.6.tar.gz
host-93:Downloads andrew$ tar xf mysql-connector-python-1.1.6.tar
host-93:Downloads andrew$ cd mysql-connector-python-1.1.6
host-93:mysql-connector-python-1.1.6 andrew$ sudo python setup.py install
Password:
running install
running build
running build_py
...

Let's do a simple query:

>>> import datetime
>>> import mysql.connector
>>> cnx = mysql.connector.connect(user='XXXX', password='XXXX', host='XXXX', database='adparker')
>>> cursor = cnx.cursor()
>>> query = "select count(1) from reviews"
>>> query = "select count(1) from reviews"
>>> cursor.execute(query)
>>> list(cursor)
[(5507,)]

Continuing, with another query:

>>> import pprint
>>> query = "select profileName, productId, score, summary from users join reviews using(userId) where summary like %s and score >= %s limit 3"
>>> cursor.execute(query, ['%great%', 4])
>>> pprint.pprint(list_of_tups)
[(u'Adriana', u'B003QWVPT4', 5, u'Great movie..came quick!'),
 (u'GoonMoon', u'B001GBPZRU', 4, u'Great fun B-movie'),
 (u'Chris Jordan', u'B000063W82', 5, u'Great DVD!')]
⚠️ **GitHub.com Fallback** ⚠️