SQL - kamialie/knowledge_corner GitHub Wiki

Data Definition Language (DDL) - includes commands for creating, droping tables, etc.

Data Manipulation Language (DML) - includes commands to query and modify database.

General

Table variables can be used to make the query more readable. Variables are set in the FROM clause (right after table name), and once set, can be used in all other parts.

SELECT A.title, A.name FROM Album A JOIN A ON A.artist_id=A.artist_id;

NULL

NULL represents unknown or undefined value. NULL value field is always excluded from queries with a condition that checks that field, unless condition is explicitly set.

SELECT sID, sName, GPA
FROM Student
WHERE GPA < 3.5 or GPA is NULL;

Database

CREATE

CREATE DATABASE - creates a new database named people and sets wider available letter set via next command with argument utf8.

```sql
CREATE DATABASE people;
```

IF NOT EXISTS - check if given name is not in conflict, otherwise error will be thrown.

```sql
CREATE DATABASE IF NOT EXISTS people;
```

Collation is set of rules used in comparison. Different character sets have multiple collations to choose. Best practice is using utf-8 with default collation utf8-general-ci.

```sql
CREATE DATABASE IF NOT EXISTS movies CHARACTER SET latin1 COLLATE latin1_swedish_ci;
CREATE DATABASE IF NOT EXISTS people DEFAULT CHARACTER SET utf8;
```

CREATE TABLE - creates table with two fields and set the maximum length;

```sql
CREATE TABLE Users (
	name VARCHAR(128),
	email VARCHAR(128)
);
```

Another example when establishing relation to a field in another table:

```sql
CREATE TABLE Album (
	album_id INTEGER NOT NULL AUTO_INCREMENT,
	title VARCHAR(255),
	artist_id INTEGER,
	PRIMARY KEY(album_id),

	CONSTRANT FOREIGN KEY (artist_id)
		REFERENCES Artist (artist_id)
		ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB;
```

ON DELETE choices:

  • Default / RESTRICT - don't allow changes that break the constraint
  • CASCADE - adjust child rows by deleting or updating them
  • SET NULL - set the foreign key in the child rows to null

Connector table example for many-to-many relationship(combination of two foreigh keys will be unique for each entry, thus, can act as primary key):

CREATE TABLE Member (
	account_id INTEGER,
	course_id INTEGER,
	role INTEGER,

	CONSTRANT FOREIGN KEY (account_id)
		REFERENCES Account (accound_id)
		ON DELETE CASCADE ON UPDATE CASCADE,
	CONSTRANT FOREIGN KEY (course_id)
		REFERENCES Account (course_id)
		ON DELETE CASCADE ON UPDATE CASCADE,
	PRIMARY KEY (account_id, course_id)
) ENGINE = InnoDB CHARACTER SET=utf-8;

DROP

Removes tables or databases.

Operators

  • Union operator returns the combined data from two relations and by default elimites duplicates. To include duplicates use UNION ALL.

     SELECT cName as name FROM College
     UNION
     SELECT sName as name FROM Student;
    
  • Intersect operator returns the data present in both relations.

     SELECT sID FROM Apply WHERE major = 'CS'
     INTERSECT
     SELECT sID FROM Apply WHERE major = 'EE';
    

    Some dbs do not support INTERSECT operator, thus, the following example does the same.

     SELECT DISTINCT A1.sID
     FROM Apply A1, Apply A2
     WHERE A1.sID = A2.sID and A1.major = 'CS' and A2.major = 'EE';
    
  • Except operator returns the data that appears in one relation, but not the other.

     SELECT sID FROM Apply WHERE major = 'CS'
     EXCEPT
     SELECT sID FROM Apply WHERE major = 'EE';
    

    Once again some dbs do not support EXCEPT operator, the folowing does the same. [not working yet]

     SELECT sID, sName
     FROM Student
     WHERE sID in (SELECT siD FROM Apply WHERE major = 'CS')
     	and sID not in (SELECT siD FROM Apply WHERE major = 'EE');
    

  • WHERE clause can use sub-query.

     SELECT GPA
     FROM Student
     WHERE sID in (SELECT sID FROM Apply WHERE major = 'CS');
    
  • EXISTS (checks if rows exists) can be used in sub-query with WHERE. The following example finds colleges that have some other college in the same state.

     SELECT cName, state
     FROM College C1
     WHERE exists (SELECT * FROM College C2
                   WHERE C1.state = C2.state and C1.cName <> C2.cName);
    
  • ALL applies to all entries of the relation.

     SELECT sNAME, GPA
     FROM Student S1
     WHERE GPA >= ALL (SELECT GPA FROM Student);
    

    ANY acts similar to ALL but returns true, if at least one is true.


JOIN family of operators:

  • INNER JOIN (is an abbreviation to simply JOIN) on a condition (query below is equivalent); WHERE clause can still be used with JOIN, actually being equivalient with ON (both can be use in one query)

     SELECT DISTINCT sName, major
     FROM Student INNER JOIN Apply
     ON Student.sID = Apply.sID;
    
     SELECT DISTINCT sName, major
     FROM Student, Apply
     WHERE Student.sID = Apply.sID:
    
  • NATURAL JOIN (performs a cross product of 2 relation with column names in common and keeps the rows with matching values, removing duplicates attribute names); two queries are identical

     SELECT DISTINCT sName, major
     FROM Student INNER JOIN Apply
     ON Student.sID = Apply.sID;
    
     SELECT DISTINCT sName, major
     FROM Student NATURAL JOIN Apply;
    
  • inner JOIN USING(attr) (specify desired attributes that are present on both related); most systems do not allow USING to be used with ON, so substitute the latter with WHERE

     SELECT DISTINCT sName, major
     FROM Student INNER JOIN Apply USING(sID);
    
  • left | right | full OUTER JOIN (even non matching rows are added with NULL padding) - depending on prefix adds entries from left/right/both relation(s) that doesn't have matching entry in another relation and pads them with NULL values; prefix implicitly means outer, so it could be dropped in a query; outer joins of more than 2 relations are not associative - different order results in different outcome

     SELECT sName, sID, cName, major
     FROM Student LEFT OUTER JOIN Apply using(sID);
    

Some DBSs require JOIN operator to be binary. In this case parenthesis can be used to do the same thing in multiple steps (query below is an equivalent). Usually in practice engine performes the operation in the same order as they are writte, thus, different orders can result in different performance.

SELECT Apply.sID, sNAME
FROM (Apply JOIN Student ON Apply.sID = Student.sID) JOIN College
ON Apply.cName = College.cName;

SELECT Apply.sID, sNAME
FROM Apply, Student, College
WHERE Apply.sID = Student.sID, Apply.cName = College.cName;

Data query

SELECT [DISTINCT|ALL ] { * | [fieldExpression [AS newName]} FROM tableName [alias] [WHERE condition][GROUP BY fieldName(s)]  [HAVING condition] ORDER BY fieldName(s)
  • DISTINCT, ALL - areoptional keywords that can be used to fine tune the results returned from the SELECT statement. If nothing is specified then ALL is assumed as the default
  • *, fieldExpression AS newName - at least one part must be specified; * selects all the fields from the specified table name, fieldExpression performs some computations on the specified fields such as adding numbers or putting together two string fields into one
  • FROM tableName - is mandatory and must contain at least one table, multiple tables must be separated using commas or joined using the JOIN
  • WHERE - condition is optional; it can be used to specify criteria in the result set returned from the query
  • GROUP BY - is used to put together records that have the same field values
  • HAVING condition - is used to specify criteria when working using the GROUP BY keyword
  • ORDER BY - is used to specify the sort order of the result set;

SELECT can also use sub-query (as long as result returns single value).


Sorting

By default, records are returned in the same order as they were added to database. ASC and DESC keywords are used to sort result in ascending or descending order. Default is ascending. When working on date data types, the earliest date is shown on top; on numeric data types the lowest is on top; string - A to Z from top to bottom.

ASC and DESC can also be used with WHERE and LIMIT.


Grouping

Summarizes data by grouping rows that have same values. Referred as group query and only returns a single row for every group item. In result returns unique values for a group or tuple of groups.

SELECT statements... GROUP BY column_name1[,column_name2,...] [HAVING condition];

Grouping is done on column_name1, optionally can do on multiple columns. Also possible to supply condition (similar to WHERE).


Examples

Reads and outputs data from specified table, which also satisfies WHERE clause, * means all rows.

```sql
SELECT * FROM Users WHERE email='[email protected]'
```

SELECT ... FROM ... JOIN ... ON - JOIN puts together two tables, creating long rows with all possible combintaions, while ON filters out only those, where primary key and foreign key match.

```sql
SELECT Album.title,Artist.name FROM Album JOIN Artist ON Album.artist_id=Artist.artist_id;
```

Sort results.

```sql
SELECT * FROM Users ORDER BY email;
```

Another example with wildcard matching (% means some letters):

```sql
SELECT * FROM Users WHERE name LIKE '%e%';
```

Limit the output of SELECT(can take a form of just number of entries or starting row and number of entries; row numbering starts from 0):

```sql
SELECT * FROM Users ORDER BY email DESC LIMIT 2;
SELECT * FROM Users ORDER BY email LIMIT 1,2;
```

Get number of rows instead of actual rows using COUNT function:

```sql
SELECT COUNT(*) FROM Users ORDER BY email;
```

Run expressions on result - f.e. concatenate two fields into one add do some formatting.

```sql
SELECT Concat(`title`, ' (', `director`, ')') , `year_released` FROM `movies`;
```

Above example setting alias for the field name, instead of expression.

```sql
SELECT Concat(`title`, ' (', `director`, ')') AS 'Concat', `year_released` FROM `movies`;
```

Sorting results based on two columns - first ascending, second descending

```sql
SELECT * FROM members ORDER BY gender, date_of_birth DESC;
```

Grouping example that Returns just two values, effectively showing unique values.

```sql
SELECT `gender` FROM `members` GROUP BY `gender`;
```

Grouping with aggregate function, which returns total number of males and females

```sql
SELECT `gender`,COUNT(`membership_number`)  FROM `members` GROUP BY `gender`;
```

Data modification

INSERT

INSERT INTO Table VALUES(A1, A2, A3)

INSERT INTO Table Select-Statement

INSERT INTO Users (name, email) VALUES ('Kamil', [email protected]);
  • all the string values should be enclosed in single quotes
  • all numeric values should be supplied directly without enclosing them in single or double-quotes
  • enclose date values in single quotes in the format 'YYYY-MM-DD'

Missed fields will be filled with NULL value. If all data is supplied, column specification can be ommitted.

Insert data from another table:

```sql
INSERT INTO table_1 SELECT * FROM table_2;
INSERT INTO table_1 (column1, column2) SELECT column1, column2 FROM table_2;
```

UPDATE

UPDATE Table SET Attr1 = Expres1,Attr2=Expres2 WHERE Condition

UPDATE Users SET name='Kamil' WHERE email='[email protected]'
UPDATE Apply SET decision = 'Y', major = 'economics' WHERE cName = 'Carnegie Mellon' and sID in (SELECT sID FROM Student WHERE GPA < 3.6)

WHERE clause is optional

DELETE

DELETE FROM Table WHERE Condition

DELETE FROM Users WHERE email='[email protected]'
DELETE FROM College WHERE cName not in (SELECT cName FROM Apply WHERE major = 'CS')

WHERE clause is optional

Patterns

Regex

SELECT statements... WHERE fieldname REGEXP 'pattern';

RLIKE is a synonym to REGEXP. Regular expressions are not case sensitive.

Wildcards

  • % (percentage) - zero or more character
  • _ (underscore) - exactly one character

NOT LIKE keyword can be used with wildcards to negate the result.

To escape wildcards in data precede them with symbol specified after ESCAPE keyword.

Examples

Returns all movies that contain code in the name.

```sql
SELECT * FROM movies WHERE title LIKE '%code%';
```

Negates the result of underscore wildcard returning all years except 2000-2010

```sql
SELECT * FROM movies WHERE year_released NOT LIKE '200_';
```

Escaping wildcards

```sql
SELECT * FROM movies WHERE title LIKE '67#%%' ESCAPE '#';
```

Functions

Built-in function operate on different data types, such as strings, numbers and date. It is also possible to add user-defined functions.

ISO standard defines 5 aggregate (perform calculations on multiple rows, on a single column, returning single result) functions:

  • COUNT - returns number of rows; to avoid duplicates add DISTINCT and specify column name

     SELECT COUNT(DISTINCT sID)
     FROM Apply
     WHERE cName = 'Cornel';
    
  • SUM

  • AVG

  • MIN

  • MAX

By default all aggregate functions exclude null values before working on the data. COUNT (*) is a special implementation of the COUNT function that returns all rows including null and duplicates.

Aggregations add 2 new clauses to SELECT FROM statement:

  • GROUP BY column - partitions relations into groups to compute aggregate function independently over each group; replace GROUP BY by ORDER BY to see all the data; example below divides all values into groups based on college name and counts number of entries for each group

     SELECT cName, COUNT(*)
     FROM Apply
     GROUP BY cName;
    
  • HAVING condition - tests filters on the results of aggregate values; applies to groups (WHERE is applied on individual rows); example below outputs college names that have less than 5 applications

    SELECT cName
    FROM Apply
    GROUP BY cName
    HAVING count(*) < 5;
    

Resources