SQL Basics - caitlinmallen/TechWiki GitHub Wiki
Introduction to Structured Query Language
Installing MAMP on Windows 10
- Install from mamp.info/en/downloads
- Once MAMP is installed, click Start Servers
- Allow mysqld and Apache HTTP through firewall
- Click Open WebStart Page
- To confirm successful install, click phpinfo and then under Tools > phpMyAdmin
Writing our first PHP Application
- Create a webpage such as:
- <h1> Hello from a webpage</h1>
- Save this under C:/MAMP/htdocs and create a new folder, save the file as index.php
- VSCode should now have syntax highlighted the file and you can now navigate to localhost/first/index.php
- We can now write in PHP to execute code using the following below the header:
<?php
Echo"Hi there\n";
?>
- This will execute the code when the webpage is loaded
- When PHP is added to the webpage's code, it will swap reading HTML to PHP and then back to HTML
- <h1> Hello from a webpage</h1>
- <p>
- <?php
- echo"Hi there\n";
- $x = 6 * 7;
- echo"The answer is ".$x." what was the question\n";
- ?>
- </p>
- <p> Antoher paragraph</p>
- Syntax errors
- Webpage cannot be displayed and if there is an error in hundreds of lines of code, trying to find it can be hard
- Identifying them can be done by opening C:\MAMP\conf\<phpversion>\php.ini
- Change display_errors = off" to "on"
- Save and then restart the servers
- Page can now be reloaded and it will show the syntax error
Installing XAMPP
- Download from apachefriends.org and install
- Install into default location of C:\xampp for ease, in practice do not actually do this
- Start up the control panel by opening xampp-control in the xampp folder
- Start Apache and MySQL
Overview of Databases
- Index- A table of contents for where things are in data
- SQL came out of a government/industry partnership
- SQL is the language we use to issue commands to a database
- Create/insert data
- Read/select some data
- Update data
- Delete data
- CRUD
- Terminology
- Database – contains one or more tables
- Relation/table – contains tuples and attributes
- Tuple/row – a set of fields which generally represent an "object" like a person or music track
- Attribute/column/field – one of possibly many elements of data corresponding to the object represented by the row
- Common database systems
- Three major database management systems in wide use
- Oracle – large, commercial, enterprise-scale, very tweakable
- MySQL – simpler but very fast and scalable, commercial or open-source
- SqlServer – From Microsoft
- Other small projects that are free and open-source
- HSQL, SQLite, PostgresSQL
- Three major database management systems in wide use
Basic SQL Operations
- In MyPHPAdmin > SQL > Opens SQL query box
- Commands
- show DATABASES;
- Will show databases in mySQL
- CREATE DATABASE People DEFAULT CHARACTER SET utf8;
- Will create database
- Default character set means you can have non-Latin characters
- CREATE TABLE Users (
- show DATABASES;
name VARCHAR(128),
email VARCHAR(128)
);
- Creates columns of the table
- Maximum of 129 characters for each field
- Efficiency
-
DESCRIBE Users;
- Shows information about the Users table
-
INSERT
- Insert statements into the table
- Example from Coursera
INSERT INTO Users (name,email) VALUES ('Chuck', '[email protected]');
INSERT INTO Users (name,email) VALUES ('Sally','[email protected]');
INSERT INTO Users (name,email) VALUES ('Somesh', '[email protected]');
INSERT INTO Users (name,email) VALUES ('Caitlin', '[email protected]');
INSERT INTO Users (name,email) VALUES ('Ted', '[email protected]');
- Inserts rows with name and email column
- DELETE
- Deletes a row in a table based on selection criteria
- DELETE FROM Users WHERE email='[email protected]';
- Deletes a row in a table based on selection criteria
- UPDATE
- Allows the updating of a field with a WHERE clause
- UPDATE Users SET name='Charles' WHERE email='[email protected]'
- Allows the updating of a field with a WHERE clause
- SELECT
- Retrieves a group of records, you can retrieve all records or a subset with WHERE
- SELECT * FROM Users -> Wildcard will select all
- SELECT * FROM Users WHERE email='[email protected]' -> Selects all users with the email [email protected]
- Retrieves a group of records, you can retrieve all records or a subset with WHERE
- ORDER BY
- Add an ORDER BY to a SELECT statement to get results sorted in ascending or descending order
- SELECT * FROM Users ORDER BY email
- LIKE
- We can do a wildcard matching in a WHERE clause using LIKE
- SELECT * FROM Users WHERE name LIKE '%e%'
- Somebody with e in their email
- % is a wildcard to use in a field
- LIMIT
- Can request the first "n" rows or the first "n" rows after starting a starting row
- First row is 0
- WHERE and ORDER BY clauses happen BEFORE the LIMIT is applied
- The limit can be a count or a starting row and count
- Can request the first "n" rows or the first "n" rows after starting a starting row
SELECT * FROM Users ORDER BY email DESC LIMIT 2;
SELECT * FROM Users ORDER BY email LIMIT 1,2;
- Counting rows with SELECT
- You can request to receive the count of the rows that would be retrieved instead of the rows
SELECT COUNT(*) FROM Users;
SELECT COUNT(*) FROM Users WHERE email='[email protected]'
- Would return 4
Data Types in SQL
- Text fields (small and large)
- Have a character set – paragraphs or HTML pages
- TINYTEXT up to 255 characters
- TEXT up to 65k
- MEDIUMTEXT up to 16m
- LONGTEXT up to 4g
- Generally not used with indexing or sorting, only limited to a prefix if used
- Have a character set – paragraphs or HTML pages
- Binary Fields (small and large)
- Character = 8-32 bits of information depending on character set
- Byte – 8 bits of information
- BYTE(n) up to 255 bytes
- VARBINARY(n) up to 65K bytes
- Small images – data
- Not indexed or sorted
- Binary Large Object (BLOB)
- Large raw data, files, images, word documents, PDFs, movies, etc
- No translation, indexing, or character set
- TINYBLOB(n) – up to 255
- BLOB(n) up to 65K
- MEDIUMBLOB(n) up to 16M
- LONGBLOB(n) up to 4g
- Integer numbers
- Can be very efficient, take little storage, and are easy to process due to CPUs comparing them with a single instruction
- TINYINT (-128, 128)
- SMALLINT (-32768, 32768)
- INT or INTEGER (2 billion)
- BIGINT –(10**18ish)
- Can be very efficient, take little storage, and are easy to process due to CPUs comparing them with a single instruction
- Floating point numbers
- Can represent a wide range of values, but accuracy is limited
- FLOAT (32-bit) 10**38 with seven digits of accuracy
- DOUBLE (64-bit) 10**308 with 14 digits of accuracy
- Can represent a wide range of values, but accuracy is limited
- Dates
- TIMESTAMP – 'YYYY-MM-DD HH:MM:SS' (1970,2037)
- DATETIME – 'YYYY-MM-DD HH:MM:SS'
- DATE – 'YYYY'MM'DD'
- TIME – 'HH:MM:SS'
- Built in MySQL function called NOW()
Database Keys and Indexes
- Often as we make multiple tables and need to JOIN them together, we need an integer primary key for each row so we can efficiently add a reference to a row in some other table as a foreign key
- DROP TABLE Users; (
- Create a brand new table instead
- Creating a table
CREATE TABLE Users
user_id INT UNSIGNED NOT NULL
AUTO_INCREMENT
email VARCHAR(128),
name VARCHAR(228).
PRIMARY KEY (user_id),
INDEX(email)
)
- Creating table
- User_id is an unsigned (positive only) not null (required) integer
- Auto_increment will supply the number if the creator does not
- Two fields being added
- PRIMARY KEY is a column used a lot and needs to be looked at quickly
- Will be a number
- INDEX will be looked up with WHERE clauses a lot and changes how the field will be used
- Hint to tell the database what you will be doing with the data
- MySQL Functions
- Has many built-in functions like NOW()
- Indexes
- As tables get large, scanning all the data to find a single row becomes costly
- When [email protected] logs into Facebook, they must find the passwords amongst billions of users
- There are techniques to greatly shorten the scan as long as you create and maintain data structures, like shortcuts
- Hashes or trees
- Hashes – exact matches
- Trees – sorting and prefix matches
- MySQL Index types
- PRIMARY KEY – Very little space, exact match, requires no duplicates, extremely fast for integer fields
- Index – good for individual row lookup and sorting/grouping results
- Works best for exact matches or prefix lookups
- B-Trees
- A tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time
- Best optimized for systems that read and write large blocks of data
- Commonly used in databases and filesystems
- A tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time
- Hashes
- Any algorithm or subroutine that maps large data sets to smaller data sets called keys
- The values returned by a hash function are called hash values, hash codes, hash sums, checksums, or hashes
- Most commonly used to accelerate table lookup or data comparison tasks such as finding items in a database
- Specifying Indexes without primary key or index
- ALTER TABLE Users ADD INDEX (email ) USING BTRTEE
- Don't always want to say USING HASH or USING BTREE since database can decide
- ALTER TABLE Users ADD INDEX (email ) USING BTRTEE
Relational Database Design
- Database design
- An art form with own particular skills and experience
- Our goal is to avoid really bad mistakes and design clean and easily understood databases
- Others may performance tune things later
- Starts with a picture
- Building a data model
- Drawing a picture of the data objects for our application and then figuring out how to represent the objects and their relationships
- Basic rule: Don't put the same string data in twice, use a relationship instead
- When there is one thing in the real world, there should be one copy of that in the database
Normalization and Foreign Keys
- Database Normalization (3NF)
- Theres tons of database theories, way too much to understand without calculus
- Do not replicate data, instead reference/point at data
- Use integers for keys and references
- Add a special key column to each table, where you'll make references to
- Theres tons of database theories, way too much to understand without calculus
- Integer Reference Pattern
- We use integer columns in one table to reference/look up rows in another table

- Both tables will have artist_id
- We use integer columns in one table to reference/look up rows in another table
- Three Kinds of keys
- Primary key – generally an integer auto-increment field
- Handle for a row
- Logical key – what the outside world uses for lookup
- Way humans look the row up
- Generally a string
- Foreign key – generally an integer key pointing to a row in another table
- Pointing to a row in another table
- Primary key – generally an integer auto-increment field
- Primary key rules
- Best practices
- Never use your logical key as the primary key
- Logical keys can and do change
- Relationships that are based on matching string fields are less efficient than integers
- Best practices
- Foreign keys
- A foreign key is when a table has a column containing a key that points to the primary key of another table
- When all primary keys are integers, then all foreign keys are integers
Building a Physical Data Schema
- Creating a database
CREATE DATABASE Music
DEFAULT CHARACTER SET utf8;
USE Music;
- Creating statements for creating tables
CREATE TABLE Artist (
artist_id INTEGER NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
PRIMARY KEY(artist_id)
) ENGINE = InnoDB;
CREATE TABLE Album (
album_id INTEGER NOT NULL AUTO_INCREMENT,
title VARCHAR(255),
artist id INTEGER,
PRIMARY KEY (album_id),
INDEX USING BTREE (title),
CONSTRAINT FOREIGN KEY (artist_id)
REFERENCES Artist (artist_id)
ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB;
- Establishing artist table and then album table which will point to artist table
- Working inwards due to how they depend on one another
- Primary key set for artist table and referenced in Album
- CONSTRAINT FOREIGN KEY is keywords
- Artist_id references a column in Artist called artist_id
- ON DELETE CASCADE ON UPDATE CASCADE
- Performance tuning
- Integer that has a special number in it that points to another table
- Creating more tables
CREATE TABLE Genre (
genre_id INTEGER NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
PRIMARY KEY(genre_id)
) ENGINE = InnoDB;
CREATE TABLE Track (
track_id INTEGER NOT NULL AUTO_INCREMENT,
title VARCHAR(255),
len INTEGER,
rating INTEGER,
count INTEGER,
album_id INTEGER,
genre_id INTEGER,
PRIMARY KEY(track_id),
INDEX USING BTREE (title),
CONSTRAINT FOREIGN KEY (album_id) REFERENCES Album (album_id)
ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT FOREIGN KEY (genre_id) REFERENCES Genre (genre_id)
ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB;
- Tons of referencing to one another
- Textual way to draw a picture show what is pointing where
- When inserting data, you need to insert from the "leaves in" and establish rows before inserting ones that depend on others
INSERT INTO Artist (name) VALUES ('Led Zepplin');
INSERT INTO Artist (name) VALUES ('AC/DC');
- Pointing to one another using numbers
- We are responsible for knowing the foreign key
- Picture we created
Relational Database Design
- Relational Power
- By removing the replicated data and replacing it with references to a single copy of each bit of data, we build a web
- Information that the relational database can ready through quickly, even for large amounts of data
- Often when you want some data it comes in a number of tables linked by foreign keys
- By removing the replicated data and replacing it with references to a single copy of each bit of data, we build a web
- The JOIN Operation
- The JOIN operation links across several tables as a part of a SELECT oeperation
- You must tell the JOIN how to use the keys that make the connection between the tables using ON
SELECT Album.title, Artist.name FROM Album JOIN Artist ON
Album.artist_id = Artist.artist_id
- SELECT Album.title, Artist.name – what we want to see
- artist_id = Artist.artist_id = how the tables are linked
- ON is where in the table they will connect
Many to Many Relationships
- Sometimes we need to model a relationship that is many to many with a connection with two foreign keys
- There is usually no separate primary key
