SQL - kamialie/knowledge_corner GitHub Wiki
Data Definition Language (DDL) - includes commands for creating, droping tables, etc.
Data Manipulation Language (DML) - includes commands to query and modify database.
General
Table variables can be used to make the query more readable. Variables are set in the FROM clause (right after table name), and once set, can be used in all other parts.
SELECT A.title, A.name FROM Album A JOIN A ON A.artist_id=A.artist_id;
NULL
NULL represents unknown or undefined value. NULL value field is always excluded from queries with a condition that checks that field, unless condition is explicitly set.
SELECT sID, sName, GPA
FROM Student
WHERE GPA < 3.5 or GPA is NULL;
Database
CREATE
CREATE DATABASE - creates a new database named people and sets wider available letter set via next command with argument utf8.
```sql
CREATE DATABASE people;
```
IF NOT EXISTS - check if given name is not in conflict, otherwise error will be thrown.
```sql
CREATE DATABASE IF NOT EXISTS people;
```
Collation is set of rules used in comparison. Different character sets have multiple collations to choose. Best practice is using utf-8 with default collation utf8-general-ci.
```sql
CREATE DATABASE IF NOT EXISTS movies CHARACTER SET latin1 COLLATE latin1_swedish_ci;
CREATE DATABASE IF NOT EXISTS people DEFAULT CHARACTER SET utf8;
```
CREATE TABLE - creates table with two fields and set the maximum length;
```sql
CREATE TABLE Users (
name VARCHAR(128),
email VARCHAR(128)
);
```
Another example when establishing relation to a field in another table:
```sql
CREATE TABLE Album (
album_id INTEGER NOT NULL AUTO_INCREMENT,
title VARCHAR(255),
artist_id INTEGER,
PRIMARY KEY(album_id),
CONSTRANT FOREIGN KEY (artist_id)
REFERENCES Artist (artist_id)
ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB;
```
ON DELETE choices:
- Default / RESTRICT - don't allow changes that break the constraint
- CASCADE - adjust child rows by deleting or updating them
- SET NULL - set the foreign key in the child rows to null
Connector table example for many-to-many relationship(combination of two foreigh keys will be unique for each entry, thus, can act as primary key):
CREATE TABLE Member (
account_id INTEGER,
course_id INTEGER,
role INTEGER,
CONSTRANT FOREIGN KEY (account_id)
REFERENCES Account (accound_id)
ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRANT FOREIGN KEY (course_id)
REFERENCES Account (course_id)
ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (account_id, course_id)
) ENGINE = InnoDB CHARACTER SET=utf-8;
DROP
Removes tables or databases.
Operators
-
Union operator returns the combined data from two relations and by default elimites duplicates. To include duplicates use UNION ALL.
SELECT cName as name FROM College UNION SELECT sName as name FROM Student;
-
Intersect operator returns the data present in both relations.
SELECT sID FROM Apply WHERE major = 'CS' INTERSECT SELECT sID FROM Apply WHERE major = 'EE';
Some dbs do not support INTERSECT operator, thus, the following example does the same.
SELECT DISTINCT A1.sID FROM Apply A1, Apply A2 WHERE A1.sID = A2.sID and A1.major = 'CS' and A2.major = 'EE';
-
Except operator returns the data that appears in one relation, but not the other.
SELECT sID FROM Apply WHERE major = 'CS' EXCEPT SELECT sID FROM Apply WHERE major = 'EE';
Once again some dbs do not support EXCEPT operator, the folowing does the same. [not working yet]
SELECT sID, sName FROM Student WHERE sID in (SELECT siD FROM Apply WHERE major = 'CS') and sID not in (SELECT siD FROM Apply WHERE major = 'EE');
-
WHERE clause can use sub-query.
SELECT GPA FROM Student WHERE sID in (SELECT sID FROM Apply WHERE major = 'CS');
-
EXISTS (checks if rows exists) can be used in sub-query with WHERE. The following example finds colleges that have some other college in the same state.
SELECT cName, state FROM College C1 WHERE exists (SELECT * FROM College C2 WHERE C1.state = C2.state and C1.cName <> C2.cName);
-
ALL applies to all entries of the relation.
SELECT sNAME, GPA FROM Student S1 WHERE GPA >= ALL (SELECT GPA FROM Student);
ANY acts similar to ALL but returns true, if at least one is true.
JOIN family of operators:
-
INNER JOIN (is an abbreviation to simply JOIN) on a condition (query below is equivalent); WHERE clause can still be used with JOIN, actually being equivalient with ON (both can be use in one query)
SELECT DISTINCT sName, major FROM Student INNER JOIN Apply ON Student.sID = Apply.sID; SELECT DISTINCT sName, major FROM Student, Apply WHERE Student.sID = Apply.sID:
-
NATURAL JOIN (performs a cross product of 2 relation with column names in common and keeps the rows with matching values, removing duplicates attribute names); two queries are identical
SELECT DISTINCT sName, major FROM Student INNER JOIN Apply ON Student.sID = Apply.sID; SELECT DISTINCT sName, major FROM Student NATURAL JOIN Apply;
-
inner JOIN USING(attr) (specify desired attributes that are present on both related); most systems do not allow USING to be used with ON, so substitute the latter with WHERE
SELECT DISTINCT sName, major FROM Student INNER JOIN Apply USING(sID);
-
left | right | full OUTER JOIN (even non matching rows are added with NULL padding) - depending on prefix adds entries from left/right/both relation(s) that doesn't have matching entry in another relation and pads them with NULL values; prefix implicitly means outer, so it could be dropped in a query; outer joins of more than 2 relations are not associative - different order results in different outcome
SELECT sName, sID, cName, major FROM Student LEFT OUTER JOIN Apply using(sID);
Some DBSs require JOIN operator to be binary. In this case parenthesis can be used to do the same thing in multiple steps (query below is an equivalent). Usually in practice engine performes the operation in the same order as they are writte, thus, different orders can result in different performance.
SELECT Apply.sID, sNAME
FROM (Apply JOIN Student ON Apply.sID = Student.sID) JOIN College
ON Apply.cName = College.cName;
SELECT Apply.sID, sNAME
FROM Apply, Student, College
WHERE Apply.sID = Student.sID, Apply.cName = College.cName;
Data query
SELECT [DISTINCT|ALL ] { * | [fieldExpression [AS newName]} FROM tableName [alias] [WHERE condition][GROUP BY fieldName(s)] [HAVING condition] ORDER BY fieldName(s)
- DISTINCT, ALL - areoptional keywords that can be used to fine tune the results returned from the SELECT statement. If nothing is specified then ALL is assumed as the default
- *, fieldExpression AS newName - at least one part must be specified; * selects all the fields from the specified table name, fieldExpression performs some computations on the specified fields such as adding numbers or putting together two string fields into one
- FROM tableName - is mandatory and must contain at least one table, multiple tables must be separated using commas or joined using the JOIN
- WHERE - condition is optional; it can be used to specify criteria in the result set returned from the query
- GROUP BY - is used to put together records that have the same field values
- HAVING condition - is used to specify criteria when working using the GROUP BY keyword
- ORDER BY - is used to specify the sort order of the result set;
SELECT can also use sub-query (as long as result returns single value).
Sorting
By default, records are returned in the same order as they were added to database. ASC and DESC keywords are used to sort result in ascending or descending order. Default is ascending. When working on date data types, the earliest date is shown on top; on numeric data types the lowest is on top; string - A to Z from top to bottom.
ASC and DESC can also be used with WHERE and LIMIT.
Grouping
Summarizes data by grouping rows that have same values. Referred as group query
and only returns a single row for every group item. In result returns
unique values for a group or tuple of groups.
SELECT statements... GROUP BY column_name1[,column_name2,...] [HAVING condition];
Grouping is done on column_name1, optionally can do on multiple columns. Also possible to supply condition (similar to WHERE).
Examples
Reads and outputs data from specified table, which also satisfies WHERE
clause, *
means all rows.
```sql
SELECT * FROM Users WHERE email='[email protected]'
```
SELECT ... FROM ... JOIN ... ON - JOIN puts together two tables, creating long rows with all possible combintaions, while ON filters out only those, where primary key and foreign key match.
```sql
SELECT Album.title,Artist.name FROM Album JOIN Artist ON Album.artist_id=Artist.artist_id;
```
Sort results.
```sql
SELECT * FROM Users ORDER BY email;
```
Another example with wildcard matching (%
means some letters):
```sql
SELECT * FROM Users WHERE name LIKE '%e%';
```
Limit the output of SELECT(can take a form of just number of entries or starting row and number of entries; row numbering starts from 0):
```sql
SELECT * FROM Users ORDER BY email DESC LIMIT 2;
SELECT * FROM Users ORDER BY email LIMIT 1,2;
```
Get number of rows instead of actual rows using COUNT function:
```sql
SELECT COUNT(*) FROM Users ORDER BY email;
```
Run expressions on result - f.e. concatenate two fields into one add do some formatting.
```sql
SELECT Concat(`title`, ' (', `director`, ')') , `year_released` FROM `movies`;
```
Above example setting alias for the field name, instead of expression.
```sql
SELECT Concat(`title`, ' (', `director`, ')') AS 'Concat', `year_released` FROM `movies`;
```
Sorting results based on two columns - first ascending, second descending
```sql
SELECT * FROM members ORDER BY gender, date_of_birth DESC;
```
Grouping example that Returns just two values, effectively showing unique values.
```sql
SELECT `gender` FROM `members` GROUP BY `gender`;
```
Grouping with aggregate function, which returns total number of males and females
```sql
SELECT `gender`,COUNT(`membership_number`) FROM `members` GROUP BY `gender`;
```
Data modification
INSERT
INSERT INTO Table VALUES(A1, A2, A3)
INSERT INTO Table Select-Statement
INSERT INTO Users (name, email) VALUES ('Kamil', [email protected]);
- all the string values should be enclosed in single quotes
- all numeric values should be supplied directly without enclosing them in single or double-quotes
- enclose date values in single quotes in the format 'YYYY-MM-DD'
Missed fields will be filled with NULL value. If all data is supplied, column specification can be ommitted.
Insert data from another table:
```sql
INSERT INTO table_1 SELECT * FROM table_2;
INSERT INTO table_1 (column1, column2) SELECT column1, column2 FROM table_2;
```
UPDATE
UPDATE Table SET Attr1 = Expres1,Attr2=Expres2 WHERE Condition
UPDATE Users SET name='Kamil' WHERE email='[email protected]'
UPDATE Apply SET decision = 'Y', major = 'economics' WHERE cName = 'Carnegie Mellon' and sID in (SELECT sID FROM Student WHERE GPA < 3.6)
WHERE clause is optional
DELETE
DELETE FROM Table WHERE Condition
DELETE FROM Users WHERE email='[email protected]'
DELETE FROM College WHERE cName not in (SELECT cName FROM Apply WHERE major = 'CS')
WHERE clause is optional
Patterns
Regex
SELECT statements... WHERE fieldname REGEXP 'pattern';
RLIKE is a synonym to REGEXP. Regular expressions are not case sensitive.
Wildcards
%
(percentage) - zero or more character_
(underscore) - exactly one character
NOT LIKE keyword can be used with wildcards to negate the result.
To escape wildcards in data precede them with symbol specified after ESCAPE keyword.
Examples
Returns all movies that contain code in the name.
```sql
SELECT * FROM movies WHERE title LIKE '%code%';
```
Negates the result of underscore wildcard returning all years except 2000-2010
```sql
SELECT * FROM movies WHERE year_released NOT LIKE '200_';
```
Escaping wildcards
```sql
SELECT * FROM movies WHERE title LIKE '67#%%' ESCAPE '#';
```
Functions
Built-in function operate on different data types, such as strings, numbers and date. It is also possible to add user-defined functions.
ISO standard defines 5 aggregate (perform calculations on multiple rows, on a single column, returning single result) functions:
-
COUNT - returns number of rows; to avoid duplicates add DISTINCT and specify column name
SELECT COUNT(DISTINCT sID) FROM Apply WHERE cName = 'Cornel';
-
SUM
-
AVG
-
MIN
-
MAX
By default all aggregate functions exclude null values before working on the
data. COUNT (*)
is a special implementation of the COUNT function that
returns all rows including null and duplicates.
Aggregations add 2 new clauses to SELECT FROM statement:
-
GROUP BY column - partitions relations into groups to compute aggregate function independently over each group; replace GROUP BY by ORDER BY to see all the data; example below divides all values into groups based on college name and counts number of entries for each group
SELECT cName, COUNT(*) FROM Apply GROUP BY cName;
-
HAVING condition - tests filters on the results of aggregate values; applies to groups (WHERE is applied on individual rows); example below outputs college names that have less than 5 applications
SELECT cName FROM Apply GROUP BY cName HAVING count(*) < 5;