SQL Basics - caitlinmallen/TechWiki GitHub Wiki

Introduction to Structured Query Language

Installing MAMP on Windows 10

  • Install from mamp.info/en/downloads
  • Once MAMP is installed, click Start Servers
    • Allow mysqld and Apache HTTP through firewall
  • Click Open WebStart Page
    • To confirm successful install, click phpinfo and then under Tools > phpMyAdmin

Writing our first PHP Application

  • Create a webpage such as:
    • <h1> Hello from a webpage</h1>
  • Save this under C:/MAMP/htdocs and create a new folder, save the file as index.php
  • VSCode should now have syntax highlighted the file and you can now navigate to localhost/first/index.php
  • We can now write in PHP to execute code using the following below the header:

<?php

Echo"Hi there\n";

?>

  • This will execute the code when the webpage is loaded
    • When PHP is added to the webpage's code, it will swap reading HTML to PHP and then back to HTML
  • <h1> Hello from a webpage</h1>
  • <p>
  • <?php
  • echo"Hi there\n";
  • $x = 6 * 7;
  • echo"The answer is ".$x." what was the question\n";
  • ?>
  • </p>
  • <p> Antoher paragraph</p>
  • Syntax errors
    • Webpage cannot be displayed and if there is an error in hundreds of lines of code, trying to find it can be hard
    • Identifying them can be done by opening C:\MAMP\conf\<phpversion&gt;\php.ini
    • Change display_errors = off" to "on"
    • Save and then restart the servers
    • Page can now be reloaded and it will show the syntax error

Installing XAMPP

  • Download from apachefriends.org and install
  • Install into default location of C:\xampp for ease, in practice do not actually do this
  • Start up the control panel by opening xampp-control in the xampp folder
    • Start Apache and MySQL

Overview of Databases

  • Index- A table of contents for where things are in data
  • SQL came out of a government/industry partnership
  • SQL is the language we use to issue commands to a database
    • Create/insert data
    • Read/select some data
    • Update data
    • Delete data
    • CRUD
  • Terminology
    • Database – contains one or more tables
    • Relation/table – contains tuples and attributes
    • Tuple/row – a set of fields which generally represent an "object" like a person or music track
    • Attribute/column/field – one of possibly many elements of data corresponding to the object represented by the row
  • Common database systems
    • Three major database management systems in wide use
      • Oracle – large, commercial, enterprise-scale, very tweakable
      • MySQL – simpler but very fast and scalable, commercial or open-source
      • SqlServer – From Microsoft
    • Other small projects that are free and open-source
      • HSQL, SQLite, PostgresSQL

Basic SQL Operations

  • In MyPHPAdmin &gt; SQL &gt; Opens SQL query box
  • Commands
    • show DATABASES;
      • Will show databases in mySQL
    • CREATE DATABASE People DEFAULT CHARACTER SET utf8;
      • Will create database
      • Default character set means you can have non-Latin characters
    • CREATE TABLE Users (

name VARCHAR(128),

email VARCHAR(128)

);

- Creates columns of the table
- Maximum of 129 characters for each field
  - Efficiency
  • DESCRIBE Users;

    • Shows information about the Users table
  • INSERT

    • Insert statements into the table
    • Example from Coursera

INSERT INTO Users (name,email) VALUES ('Chuck', '[email protected]');

INSERT INTO Users (name,email) VALUES ('Sally','[email protected]');

INSERT INTO Users (name,email) VALUES ('Somesh', '[email protected]');

INSERT INTO Users (name,email) VALUES ('Caitlin', '[email protected]');

INSERT INTO Users (name,email) VALUES ('Ted', '[email protected]');

- Inserts rows with name and email column
  • DELETE
    • Deletes a row in a table based on selection criteria
  • UPDATE
    • Allows the updating of a field with a WHERE clause
  • SELECT
    • Retrieves a group of records, you can retrieve all records or a subset with WHERE
  • ORDER BY
    • Add an ORDER BY to a SELECT statement to get results sorted in ascending or descending order
    • SELECT * FROM Users ORDER BY email
  • LIKE
    • We can do a wildcard matching in a WHERE clause using LIKE
    • SELECT * FROM Users WHERE name LIKE '%e%'
      • Somebody with e in their email
      • % is a wildcard to use in a field
  • LIMIT
    • Can request the first "n" rows or the first "n" rows after starting a starting row
      • First row is 0
    • WHERE and ORDER BY clauses happen BEFORE the LIMIT is applied
    • The limit can be a count or a starting row and count

SELECT * FROM Users ORDER BY email DESC LIMIT 2;

SELECT * FROM Users ORDER BY email LIMIT 1,2;

  • Counting rows with SELECT
    • You can request to receive the count of the rows that would be retrieved instead of the rows

SELECT COUNT(*) FROM Users;

SELECT COUNT(*) FROM Users WHERE email='[email protected]'

- Would return 4

Data Types in SQL

  • Text fields (small and large)
    • Have a character set – paragraphs or HTML pages
      • TINYTEXT up to 255 characters
      • TEXT up to 65k
      • MEDIUMTEXT up to 16m
      • LONGTEXT up to 4g
    • Generally not used with indexing or sorting, only limited to a prefix if used
  • Binary Fields (small and large)
    • Character = 8-32 bits of information depending on character set
    • Byte – 8 bits of information
      • BYTE(n) up to 255 bytes
      • VARBINARY(n) up to 65K bytes
    • Small images – data
    • Not indexed or sorted
  • Binary Large Object (BLOB)
    • Large raw data, files, images, word documents, PDFs, movies, etc
    • No translation, indexing, or character set
      • TINYBLOB(n) – up to 255
      • BLOB(n) up to 65K
      • MEDIUMBLOB(n) up to 16M
      • LONGBLOB(n) up to 4g
  • Integer numbers
    • Can be very efficient, take little storage, and are easy to process due to CPUs comparing them with a single instruction
      • TINYINT (-128, 128)
      • SMALLINT (-32768, 32768)
      • INT or INTEGER (2 billion)
      • BIGINT –(10**18ish)
  • Floating point numbers
    • Can represent a wide range of values, but accuracy is limited
      • FLOAT (32-bit) 10**38 with seven digits of accuracy
      • DOUBLE (64-bit) 10**308 with 14 digits of accuracy
  • Dates
    • TIMESTAMP – 'YYYY-MM-DD HH:MM:SS' (1970,2037)
    • DATETIME – 'YYYY-MM-DD HH:MM:SS'
    • DATE – 'YYYY'MM'DD'
    • TIME – 'HH:MM:SS'
    • Built in MySQL function called NOW()

Database Keys and Indexes

  • Often as we make multiple tables and need to JOIN them together, we need an integer primary key for each row so we can efficiently add a reference to a row in some other table as a foreign key
  • DROP TABLE Users; (
    • Create a brand new table instead
  • Creating a table

CREATE TABLE Users

user_id INT UNSIGNED NOT NULL

AUTO_INCREMENT

email VARCHAR(128),

name VARCHAR(228).

PRIMARY KEY (user_id),

INDEX(email)

)

  • Creating table
    • User_id is an unsigned (positive only) not null (required) integer
    • Auto_increment will supply the number if the creator does not
    • Two fields being added
    • PRIMARY KEY is a column used a lot and needs to be looked at quickly
      • Will be a number
    • INDEX will be looked up with WHERE clauses a lot and changes how the field will be used
      • Hint to tell the database what you will be doing with the data
  • MySQL Functions
    • Has many built-in functions like NOW()
  • Indexes
    • As tables get large, scanning all the data to find a single row becomes costly
    • When [email protected] logs into Facebook, they must find the passwords amongst billions of users
    • There are techniques to greatly shorten the scan as long as you create and maintain data structures, like shortcuts
    • Hashes or trees
      • Hashes – exact matches
      • Trees – sorting and prefix matches
  • MySQL Index types
    • PRIMARY KEY – Very little space, exact match, requires no duplicates, extremely fast for integer fields
    • Index – good for individual row lookup and sorting/grouping results
      • Works best for exact matches or prefix lookups
  • B-Trees
    • A tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time
      • Best optimized for systems that read and write large blocks of data
      • Commonly used in databases and filesystems
  • Hashes
    • Any algorithm or subroutine that maps large data sets to smaller data sets called keys
    • The values returned by a hash function are called hash values, hash codes, hash sums, checksums, or hashes
    • Most commonly used to accelerate table lookup or data comparison tasks such as finding items in a database
  • Specifying Indexes without primary key or index
    • ALTER TABLE Users ADD INDEX (email ) USING BTRTEE
      • Don't always want to say USING HASH or USING BTREE since database can decide

Relational Database Design

  • Database design
    • An art form with own particular skills and experience
    • Our goal is to avoid really bad mistakes and design clean and easily understood databases
    • Others may performance tune things later
    • Starts with a picture
  • Building a data model
    • Drawing a picture of the data objects for our application and then figuring out how to represent the objects and their relationships
    • Basic rule: Don't put the same string data in twice, use a relationship instead
    • When there is one thing in the real world, there should be one copy of that in the database

Normalization and Foreign Keys

  • Database Normalization (3NF)
    • Theres tons of database theories, way too much to understand without calculus
      • Do not replicate data, instead reference/point at data
      • Use integers for keys and references
      • Add a special key column to each table, where you'll make references to
  • Integer Reference Pattern
    • We use integer columns in one table to reference/look up rows in another table
      • Both tables will have artist_id
  • Three Kinds of keys
    • Primary key – generally an integer auto-increment field
      • Handle for a row
    • Logical key – what the outside world uses for lookup
      • Way humans look the row up
      • Generally a string
    • Foreign key – generally an integer key pointing to a row in another table
      • Pointing to a row in another table
  • Primary key rules
    • Best practices
      • Never use your logical key as the primary key
      • Logical keys can and do change
      • Relationships that are based on matching string fields are less efficient than integers
  • Foreign keys
    • A foreign key is when a table has a column containing a key that points to the primary key of another table
    • When all primary keys are integers, then all foreign keys are integers

Building a Physical Data Schema

  • Creating a database

CREATE DATABASE Music

DEFAULT CHARACTER SET utf8;

USE Music;

  • Creating statements for creating tables

CREATE TABLE Artist (

artist_id INTEGER NOT NULL AUTO_INCREMENT,

name VARCHAR(255),

PRIMARY KEY(artist_id)

) ENGINE = InnoDB;

CREATE TABLE Album (

album_id INTEGER NOT NULL AUTO_INCREMENT,

title VARCHAR(255),

artist id INTEGER,

PRIMARY KEY (album_id),

INDEX USING BTREE (title),

CONSTRAINT FOREIGN KEY (artist_id)

REFERENCES Artist (artist_id)

ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = InnoDB;

  • Establishing artist table and then album table which will point to artist table
    • Working inwards due to how they depend on one another
  • Primary key set for artist table and referenced in Album
  • CONSTRAINT FOREIGN KEY is keywords
    • Artist_id references a column in Artist called artist_id
  • ON DELETE CASCADE ON UPDATE CASCADE
    • Performance tuning
    • Integer that has a special number in it that points to another table
  • Creating more tables

CREATE TABLE Genre (

genre_id INTEGER NOT NULL AUTO_INCREMENT,

name VARCHAR(255),

PRIMARY KEY(genre_id)

) ENGINE = InnoDB;

CREATE TABLE Track (

track_id INTEGER NOT NULL AUTO_INCREMENT,

title VARCHAR(255),

len INTEGER,

rating INTEGER,

count INTEGER,

album_id INTEGER,

genre_id INTEGER,

PRIMARY KEY(track_id),

INDEX USING BTREE (title),

CONSTRAINT FOREIGN KEY (album_id) REFERENCES Album (album_id)

ON DELETE CASCADE ON UPDATE CASCADE,

CONSTRAINT FOREIGN KEY (genre_id) REFERENCES Genre (genre_id)

ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = InnoDB;

  • Tons of referencing to one another
    • Textual way to draw a picture show what is pointing where
  • When inserting data, you need to insert from the "leaves in" and establish rows before inserting ones that depend on others

INSERT INTO Artist (name) VALUES ('Led Zepplin');

INSERT INTO Artist (name) VALUES ('AC/DC');

  • Pointing to one another using numbers
  • We are responsible for knowing the foreign key
  • Picture we created

Relational Database Design

  • Relational Power
    • By removing the replicated data and replacing it with references to a single copy of each bit of data, we build a web
      • Information that the relational database can ready through quickly, even for large amounts of data
    • Often when you want some data it comes in a number of tables linked by foreign keys
  • The JOIN Operation
    • The JOIN operation links across several tables as a part of a SELECT oeperation
    • You must tell the JOIN how to use the keys that make the connection between the tables using ON

SELECT Album.title, Artist.name FROM Album JOIN Artist ON

Album.artist_id = Artist.artist_id

  • SELECT Album.title, Artist.name – what we want to see
  • artist_id = Artist.artist_id = how the tables are linked
  • ON is where in the table they will connect

Many to Many Relationships

  • Sometimes we need to model a relationship that is many to many with a connection with two foreign keys
  • There is usually no separate primary key