SQL - robbiehume/CS-Notes GitHub Wiki

Links

SQL software

Squirrel, Toad

Look into

Indexes
Stored procedures
VIEWs: virtual tables
Window functions
Correlated subqueries
Scalar subqueries
Recursive CTEs
Pivot tables

General

Always use single quotes (') for strings to ensure compatibility across different SQL dialects
- To include a single quote inside a string, escape it by doubling the quote (''): SELECT 'It''s a great day!' AS message;
Use double quotes (") for identifiers if needed (e.g., column names with spaces or special characters)

Tips and tricks

Return the number of desired results + 1 so you know if there's more to get on the next page or not

Table data: `INSERT`, `UPDATE`, and `DELETE`

Inserting data: `INSERT`

Insert data: all fields

INSERT INTO user 
VALUES (5, 'Alice', 9);

Insert partial data:

INSERT INTO user (id, name)
VALUES (5, 'Alice');

Insert multiple rows:

INSERT INTO user (id, name)
VALUES 
  (5, 'Alice')
  (6, 'Bob');

Modifying data: `UPDATE`

Modify column field:

UPDATE user 
SET name = 'Robert'
WHERE id = 6;

Modify multiple columns:

UPDATE user 
SET name = 'Robert', age = 18
WHERE id = 6;

Modify multiple rows:
- ```
UPDATE user 
SET age = 18
WHERE id < 4;
```

Modify column field (with arithmetic operations):

UPDATE user 
SET age = 1 + age
WHERE id = 6;

Deleting data: `DELETE`

Delete row:
- ```
DELETE FROM user 
WHERE id = 6;
```
Delete all rows:
- ```
DELETE FROM user;
```

Joins (link)

The JOIN operator lets you combine related information from multiple tables into a new table
Inner join vs outer join
- Inner join: JOIN; keeps only the rows from both tables that's related to each other (in the resulting table)
  - In MySQL, a plain JOIN without an ON is treated as a CROSS JOIN
- Outer joins: will also keep rows that are not related to the other table and missing data will be filled with NULL
  - Left (outer) join: LEFT JOIN; keeps the unrelated data from the left (the first) table
    - Keeps any rows from the first table that don't have a match with the second table
  - Right (outer) join: RIGHT JOIN; keeps the unrelated data from the right (the second) table
    - Keeps any rows from the second table that don't have a match with the first table
  - Full (outer) join: FULL JOIN; keeps all rows from both tables
- The ON clause is similar to WHERE for SELECT
- Syntax:
  - ```
     SELECT pets.name AS pet_name, owners.name AS owner
     FROM pets
     JOIN owners
     ON pets.owner_id = owners.id;  
```
- Should use table aliases if joining a table with itself
- Can also do joins by having multiple tables in the FROM statement
Cross Join: produces a Cartesian product, combining each row from the first table with every row from the second table
- Use: you need every combination of rows from two tables
- In MySQL, a plain JOIN without an ON is treated as a CROSS JOIN
Self Join: joins a table to itself; same as an inner join with itself
- Use: you need to compare rows within the same table

Table aliases:

   SELECT t.id
   FROM table_name t    -- or FROM table_name AS t

Types of keys:

NOTE: only 3 types of keys are actually used in a database (primary, unique, and foreign). The rest are only concepts of RDBMS
Super key: a set of one or more than one key that can be used to identify a record uniquely in a table
- Primary key, unique key, and alternate key are a subset of super keys
- A super key can contain multiple attributes that might not be able to identify tuples in a table independently, but when grouped with certain keys, they can identify tuples uniquely
Candidate key: a set of one or more fields/columns that can identify a record uniquely in a table
- There can be multiple candidate keys in one table and each candidate key can work as a primary key
Primary key: a set of one or more fields/columns of a table that can uniquely identify a record in a table
- There is only one chosen primary key
Alternate key: a candidate key that currently is not a primary key
Composite key: a combination of more than one field/column of a table
Unique key: a set of one or more fields/columns of a table that uniquely identify a record in a table
- It's like a primary key, but it can accept only one NULL value and it can not have duplicate values, while PK doesn't allow any NULL or duplicate values
Foreign key: a field/column in the table that is the primary key in another table

General Notes

EXPLAIN: In MySQL / MariaDB (different keyword in others?), you can add "EXPLAIN" to the beginning of the query to see each step it takes
- If any of the query steps say "Using full table", then it's potentially making your query slower
\G: in MySQL / MariaDB can add \G to the end before (or instead of) the semicolon to display a better row format for the results

Built-in Utility Functions

Comparison Table

Category	Feature	PostgreSQL	MySQL	SQLite
Date/Time	`CURRENT_DATE`	✅	✅	✅
Date/Time	`CURRENT_TIMESTAMP`	✅	✅	✅
Date/Time	`NOW()`	✅	✅	❌ (Use `datetime('now')`, `CURRENT_TIMESTAMP`)
Date/Time	`EXTRACT(<YEAR, MONTH, DAY> FROM <date>)`	✅	✅	❌ (use `strftime('%Y', date)`, etc.)
Date/Time	`YEAR(<date>), MONTH(<date>), DAY(<date>)`	❌ (Use `EXTRACT`)	✅	❌ (use `strftime('%Y', date)`, etc.)
Date/Time	`DATE_ADD()`	❌ (Use interval arithmetic +)	✅	❌ (Use `datetime(date, '+X units')`)
Date/Time	`SYSDATE`	❌ (Core) / ✅ (with `orafce` ext)	✅ (Note behavior difference vs `NOW()`)	❌
Strings	`CONCAT()`	✅	✅	❌ Use `\|\|`
Strings	`\|\|` for concatenation	✅	❌ (`\|\|` is logical OR)	✅
Strings	`STRING_AGG()`	✅	❌ (Use `GROUP_CONCAT`)	❌ (Use `GROUP_CONCAT`)
Strings	`GROUP_CONCAT()`	❌ (Use `STRING_AGG`)	✅	✅
Strings	`TRIM()`	✅	✅	✅
Strings	`INITCAP()`	✅	❌	❌
Logic/Null	`COALESCE()`, `NULLIF()`	✅	✅	✅
Other	`RETURNING`	✅	❌	✅ (Since v3.35.0)

Can combine multiple functions

Ex: SELECT ROUND(AVG(LENGTH(COALESCE(summary, ''))), 2) AS avg_summary_length

`COALESCE(col_name, 'default value')`

Replaces NULL or empty values with 'default value'

`LENGTH(col_name)`

Returns the number of characters in a string

`ROUND(col_name, <decimal_places>)`

Rounds the column name or result of another function to the specified number of decimal places
Ex: ROUND((AVG(price), 2); rounds to two decimal places (e.g. 3.25)

Operators

`BETWEEN`:

Check if a date column is between two dates

SELECT *
FROM events
WHERE event_date BETWEEN '2024-11-01' AND '2024-11-30';

Aggregate functions

SUM(), AVG(), COUNT(), MIN(), MAX()

Scenario	Example	Requires `GROUP BY`?
Standard Aggregate	`SELECT SUM(sales) FROM table`	No
Grouped Aggregate	`SELECT region, SUM(sales) GROUP BY region`	Yes
Window Function	`SELECT SUM(sales) OVER (...) FROM table`	No (uses `OVER`)

`GROUP BY`:

It's used with aggregate functions and used in collaboration with the SELECT statement to arrange identical data into groups
Can use column numbers (1, 2, etc.) instead of names. The numbers correspond to the order of the columns in the SELECT statement
- This is especially beneficial when a column in the select is an expression
- Only available in MySQL, PostgreSQL, and some other databases
- Ex: 1 refers to CONCAT(publisher, ' Books'):
```
SELECT CONCAT(publisher, ' Books'), COUNT(*)
FROM Books
GROUP BY 1;
```
- Since this isn't accepted across all databases, column name aliases may be better for consistency:
```
SELECT CONCAT(publisher, ' Books') AS publisher_books, COUNT(*)
FROM Books
GROUP BY publisher_books;
```
When you use a GROUP BY, all columns in the SELECT statement must either:
- Be part of the GROUP BY clause, or
- Be aggregated using an aggregate function (e.g., COUNT, MAX, MIN).
When doing a COUNT() on a query with a GROUP BY, it will give a count for each of the groups

For filtering a GROUP BY with an aggregate function, need to use HAVING instead of WHERE:

SELECT MAX(number) AS largest_single_number
FROM (
  SELECT number
  FROM Numbers
  GROUP BY number
  HAVING COUNT(*) = 1
);

`ORDER BY`:

You don't need to SELECT the column you're applying the ORDER BY to
Can order by multiple columns (will order by the first, and use the second column to order any rows that have the same first column value)

`LIMIT` (and offsets):

Can provide an limit of how many results to select: LIMIT 5
Can also add an offset:
- MySQL: LIMIT offset, count;
  - Ex: LIMIT 10, 5; skips 10 rows, returns the next 5
- PostgreSQL: LIMIT count OFFSET offset;
  - Ex: LIMIT 5 OFFSET 10; skips 10 rows, returns the next 5

`LIKE`:

On some RDBMS, LIKE is case-sensitive; on others it's not

Wildcards:

_: matches exactly one character; WHERE title LIKE '_arm' ()

WHERE title LIKE '_arm'  -- Matches titles where a four-character word ends with "arm" (e.g., "Farm", "Harm")

%: matches zero or more characters

WHERE title LIKE '% %'  -- Ensure the title has more than one word (contains a space between one or more characters on both sides)
WHERE LOWER(title) NOT LIKE '%z%'  -- Exclude titles containing 'z' (case-insensitive)

Common use cases:

-- Search for substrings
SELECT * FROM users WHERE username LIKE '%john%';  -- Finds usernames containing "john" anywhere in the string

-- Starts with
SELECT * FROM employees WHERE name LIKE 'A%';  -- Finds names starting with "A

-- Ends with
SELECT * FROM products WHERE category LIKE '%tools';  -- Finds categories ending with "tools"

-- Search for a specific pattern
SELECT * FROM files WHERE filename LIKE 'report_2023_%.pdf';  -- Matches filenames starting with "report_2023_" and ending with ".pdf"

Performance Tips:
- Avoid using LIKE '%pattern%' on large datasets as it prevents the use of indexes and can slow down queries
- Consider using full-text search or indexed columns for better performance

Regex:

Feature	MySQL	PostgreSQL
Matching Op	`REGEXP`, `RLIKE`	`~` (case-sens), `~*` (case-insens)
Negated Op	`NOT REGEXP`, `NOT RLIKE`	`!~` (case-sens), `!~*` (case-insens)
Case Default	Insensitive	Sensitive (depends on `~` vs `~*`)
Case Control	`BINARY` keyword or `REGEXP_LIKE` flags	Use `~` vs `~*`, or flags in functions
Extraction	`REGEXP_SUBSTR()`	`substring()`, `regexp_match[es]()`
Replacement	`REGEXP_REPLACE()`	`regexp_replace()`
Usage Example (all that start with 'A')	`WHERE col REGEXP '^A'`	`WHERE col ~ '^A'`

`CASE`:

It's a conditional expression that allows you to perform logical checks and return different values based on the conditions
- It's similar to an if/else statement
It's typically used in the SELECT, WHERE, ORDER BY, or HAVING clauses to introduce conditional logic into your queries

Structure (ELSE is optional)

-- Simple CASE Expression
CASE expression
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ...
  ELSE default_result
END

-- Searched CASE Expression
CASE 
  WHEN condition1 THEN result1
  WHEN condition2 THEN result2
  ...
  ELSE default_result
END

Ex: check if there are any items

SELECT 
  CASE 
    WHEN item_count > 0 THEN 'True'
    ELSE 'False'
  END AS contains_items
FROM inventory;

`COUNT()`:

By default, COUNT will include duplicate values in the count. It essentially counts all rows for which there is a non-null value in the column
If only wanting to count the unique values in a column you can do COUNT(DISTINCT <col_name>)
Two ways to handle NULL values:
- SELECT count(col_name): ignores the count of all the NULL values in the col_name column
- SELECT COUNT(*): counts rows regardless of the NULL values
When doing a COUNT() on a query with a GROUP BY, it will give a count for each of the groups

`SUM`:

`UNION`: combines the results of two or more SELECT queries into a single result set

All duplicate rows are removed by default, unless you do UNION ALL
It requires all SELECT queries to have:
- The same number of columns
- Matching column data types in corresponding positions

Ex:

SELECT name FROM Customers
UNION
SELECT name FROM Suppliers;

`IN`:

If you want to get rows that have certain values you can use IN instead of multiple = statements
Ex: WHERE color = 'red' OR color = 'blue' --> WHERE color IN ('red', 'blue')

Window Functions

Overview of components

📊 Window Function: The calculation (e.g., SUM(), ROW_NUMBER())
🪟 OVER() Clause: The window (which rows to include)
🧩 PARTITION BY: How to group windows

<window_function>() OVER (PARTITION BY column_name ORDER BY column_name)

Function Types

Ranking Functions:
- ROW_NUMBER() – Assigns a unique number to each row within a partition
- RANK() – Assigns a rank with gaps for ties
- DENSE_RANK() – Assigns a rank without gaps for ties
Analytic Functions:
- LAG() – Finds the value of a previous row
- LEAD() – Finds the value of the next row
- FIRST_VALUE() – Finds the first value in the window
- LAST_VALUE() – Finds the last value in the window
Aggregate Functions that can be used as Window Functions (but don't have to be):
- SUM() – Running totals
- AVG() – Moving averages
- COUNT() – Cumulative counts
- MIN()/MAX() – Sliding minimum/maximum

Window Function Clauses: `OVER()` and `PARTITION BY`

OVER() Clause:
- Defines the window (set of rows) for the window function
- It tells SQL:
  - How to partition the data (group it)
  - How to order the data (sequence it)
- Required for all window functions (except aggregate functions). It defines the window frame
PARTITION BY Clause
- Similar to GROUP BY but without reducing rows
- It splits the result set into partitions (subgroups) before performing calculations
- Each partition acts like a mini-table for the window function to operate on
- It's optional
  - Without PARTITION BY:
    - The window function acts on all rows together.
  - With PARTITION BY:
    - The window function resets for each partition (group)

How to speed up (optimize) a query to improve performance:

Indexing
Optimizing JOINs
Avoiding SELECT * (retrieve only necessary data)
Query caching for repeated queries
Use LIMIT to restrict the number of rows returned
Try to avoid subqueries and use JOINs where possible
Stored procedures?
Can use EXPLAIN ANALYZE

Stored procedures

Stored procedures are recompiled collections of SQL statements that are saved and stored in a database for repeated use
Best situations for stored procedures:
- Complex business logic that can be handled entirely within the database
- Reducing application-database interaction when multiple queries would otherwise be required

Database indexes:

https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-get-started-with-sql/modules/analyze-data-sql-learn-manipulation-c4b/articles/sql-indexes
Indexes help speed up querying by providing a method to quickly lookup requested data
Simply put, an index is a pointer to data in a table. It's very similar to a index in the back of a book
Indexes serve as lookup tables that efficiently store data for quicker retrieval
A table can have multiple indexes
Updating a table that has indexes takes more time than updating one without (b/c the indexes also need an update)
- So only create indexes on columns that will be frequently searched against
Ex: getting records in the past 24 hours
- By indexing a timestamp column you could look at the timestamp and once there's one that is > 24 hours ago, you know you can stop looking
- Without indexing, you would have to look through ALL the records to check each timestamp and thus greatly increasing the time complexity

Subqueries (AKA nested query)

https://learnsql.com/blog/sql-subquery-types/
https://learnsql.com/blog/sql-subquery-examples/
https://learnsql.com/blog/sql-subqueries/
A subquery is a query placed within another SQL query
- They can be included in the WHERE, FROM, or SELECT clauses of the main query
Can use them with the ANY or ALL keywords if the subquery can return multiple rows
Can also use them with the IN operator

Ex of subquery in FROM (find most number of cats in one breed)

SELECT MAX(number_of_cats)
FROM (SELECT breed, COUNT(*) AS number_of_cats
      FROM cat
      GROUP BY breed) breed_count   -- need this subquery table alias

Operators:

`IN`:

If you want to get rows that have certain values you can use IN instead of multiple = statements
Ex: WHERE color = 'red' OR color = 'blue' --> WHERE color IN ('red', 'blue')

Can also use it in subqueries

Ex: get the price of trips to cities with populations greater than 100,000

SELECT price
FROM trip
WHERE city_id IN (
  SELECT id
  FROM city
  WHERE population > 100000
);

`ALL`:

True if the comparison is true for ALL of the values of the subquery
Can be used with logical operators: = ALL, != ALL, > ALL, >= ALL, < ALL, <= ALL

Ex: find countries that have a population greater than every city

SELECT * FROM country
WHERE population > ALL (
  SELECT population FROM city
);

`ANY`:

True if the comparison is true for ANY of the values of the subquery
Same syntax and logical operators as ALL, just different logic

`EXISTS`:

If the subquery returns rows, then the result from the outer query is added to the result set
- If it returns NULL, then it's not added (skipped)
Can use the NOT operator to inverse the EXISTS clause

Scalar subqueries:

return a single value, or exactly one row and exactly one column

Ex: get paintings with a price higher than the average painting

    SELECT name, listed_price
    FROM paintings
    WHERE listed_price > (
        SELECT AVG(listed_price)
        FROM paintings
    );

Correlated subqueries:

Overview:
- A correlated subquery is a subquery that depends on the outer query for its values
- It's evaluated once per row of the outer query, making it useful for row-wise comparisons and context-aware filtering
🔍 What Makes It "Correlated"?
- The subquery references a column from the outer query
- It cannot be run independently of the outer query
- It’s evaluated repeatedly—once for each row in the outer query
📌 Where can it be used?
- In the WHERE clause: to filter rows based on dynamic comparisons
- In the SELECT clause: to return a value computed per row
- Also usable in HAVING and FROM (less common)

In SELECT:

Ex: for each cat, computes the average age of all cats sharing the same name

SELECT 
   name, 
   (SELECT AVG(age) 
    FROM cats c2 
    WHERE c2.name = c1.name) AS avg_age_for_name
FROM cats c1;

In WHERE:
- Ex: find all countries whose area is smaller than or equal to the smallest city area within that country
- ```
SELECT * 
FROM country
WHERE area <= (
  SELECT MIN(area) 
  FROM city
  WHERE city.country_id = country.id
);
```
- Why would we use such a query? It can be very convenient if we want to check whether there any are errors in our database
  - If this query returned anything other than nothing, we would know that something fishy is going on in our records

Multirow subqueries:

Return either one column with multiple rows (i.e. a list of values) or multiple columns with multiple rows (i.e. tables)

CTEs (common table expressions) aka WITH

https://learnsql.com/blog/what-is-sql-cte/
A CTE, also referred to as a WITH clause, is a temporary named result set that you can reference anywhere in your query
CTEs can help organize and simplify long, complex hierarchical queries and improve readability by breaking them down into smaller blocks
If possible, CTEs are better to use than subqueries
Can be used in SELECT, INSERT, UPDATE, DELETE, and MERGE statements

Ex:

--CTE
WITH cte_sales AS (
    SELECT EmployeeID, COUNT(*) AS OrderID 
    FROM Orders
    GROUP BY EmployeeID
    )
--Query using CTE
SELECT AVG(OrderID) AS average_orders_per_employee
FROM cte_sales;

This is a simple example, but can create multiple CTEs which is when you really start to see the advantages

Recursive CTE
- https://builtin.com/data-science/recursive-sql

Subqueries vs CTEs

https://learnsql.com/blog/sql-subquery-cte-difference/
https://learnsql.com/blog/reasons-to-use-ctes/
https://towardsdatascience.com/sql-for-data-analysis-subquery-vs-cte-699ef629d9eb
There are many cases where CTEs are better to use than subqueries
From a performance standpoint, there's not much difference because CTEs run as subqueries
Main differences / benefits
- CTEs can be recursive
- CTEs are reusable
- CTEs can be more readable
- CTEs must always have a name
Benefits of subqueries:
- Can be used in the WHERE clause
- Can do correlated subqueries

Comparison

Feature	Subqueries	CTEs
Readability	Good for simple queries	Better for complex or multi-step queries
Reusability	Limited to the specific clause	Can be reused multiple times in the query
Recursion	No	Yes (recursive CTEs)
Optimization	May be executed multiple times (per row)	Typically optimized better, but may materialize
Performance	Can be slower for nested queries	Can perform better, but large CTEs may use more memory
Use Cases	Simple filters, one-time calculations	Complex logic, recursive queries, reusability

Subqueries vs JOINs

https://learnsql.com/blog/subquery-vs-join/
https://learnsql.com/blog/converting-subqueries-joins/
JOINs usually perform faster than subqueries, but subqueries can sometimes be more intuitive

Syntax differences between MySQL (MariaDB), PostgreSQL, and SQLite

MySQL: not case-sensitive, can use either "" or ''
PostgreSQL: case-sensitive, can only use ''
SQLite: case-sensitive, can use either "" or ''

Complex Queries

Start with the data model: Which tables store which data and the nature of relations b/w these tables
First step is to determine which tables will be used, determined by: All tables containing data needed to display the result If any of these tables are not directly related, also include all tables between them and figure out how to join them correctly
Start with smaller/simpler queries and put them together, checking the results along the way

Approach to Complex SQL Queries

0.1. look at the question, what will I need?
- What tables?
- GROUP BY?
- Subquery?
- Outer joins?
- Big expressions?
0.2. get some working code
Get the joins right
- Add one join at a time and test/verify along the way
Write any expressions, and verify them
- For verification, do them in a select clause even if you may use it elsewhere
Write any where clause expressions, and verify them
Arrange the columns and rows to visualize how the grouping will take place
Add the grouping, visually verify
Add the having clause
Add the order by
Tidy up the code

pymysql

Can setup a connection and then get the cursor and do:
- cursor.execute('sql with %(substitution)s', {'substitution': 'sub'})
In order to build a string with substitutions

def insert_keywords(cursor, r_id, keywords):
    keyword_list = ''
    sql = "INSERT INTO phrases (p_id, phrase_text) SELECT uuid(), kw.txt FROM phrases p RIGHT OUTER JOIN ("
    sql_dict = {}
    for keyword in keywords.split(','): 
        kw_trimmed = ' '.join(keyword.strip().split())
        if len(kw_trimmed) > 255 or not kw_trimmed:
            continue
        keyword_list += f'%({kw_trimmed})s, '
        sql += f'(SELECT %({kw_trimmed})s AS txt) union '
        sql_dict[f'{kw_trimmed}'] = kw_trimmed
    sql = sql[0:-7]
    sql += ") kw ON lower(p.phrase_text) = lower(kw.txt) WHERE p.p_id is null"
    
    cursor.execute(sql, sql_dict)
    
    cursor.execute(f''' INSERT INTO r_phrases (r_id, p_id) 
                            SELECT %(r_id)s, p_id 
                            FROM phrases WHERE phrase_text IN ({keyword_list[:-2]}) '''
                            , { 'r_id': r_id, **sql_dict})

Common problem types

Gaps and islands

Identifying and grouping consecutive rows (islands) that share a common characteristic, while recognizing the breaks (gaps) between these sequences

Top-N per Group

Selecting the top N rows for each category or group, which often involves ranking functions or subqueries

Running Totals / Cumulative Sums

Calculating an accumulated total or sum over a set of rows, usually in a time series or ordered dataset

Recursive / Hierarchical Queries

Querying data that has a recursive structure (such as organizational charts or bill-of-materials) using techniques like recursive common table expressions (CTEs)

Pivot and Unpivot

Transforming rows into columns (pivot) or columns into rows (unpivot) to reshape data for reporting or analysis.

Specific scenarios:

Find all rows that have the max value of a column

SELECT *
FROM table_name
WHERE col_name = (SELECT MAX(col_name) FROM table_name;

Date clause

BETWEEN or comparison brackets:

SELECT *
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- Other options
WHERE order_date >= '2024-01-01' AND order_date <= '2024-12-31';
WHERE order_date > '2024-01-01';

Count number of rows corresponds to each unique value in a column:
- Use COUNT() and GROUP BY clause
```
SELECT publisher, COUNT(*) AS book_count
FROM Books
GROUP BY publisher;
```

Split string on delimiter:

MySQL: SUBSTRING_INDEX()

SUBSTRING_INDEX(str, delim, n)

str: the full string to split
delim: the delimiter (e.g. '.')
n (count): how many parts to include; if:
- positive: returns everything before the nth occurrence of delim
- negative: returns everything after the nth-to-last occurrence of delim

Ex:

SELECT SUBSTRING_INDEX('www.example.com', '.', 1);   -- returns 'www'
SELECT SUBSTRING_INDEX('www.example.com', '.', 2);   -- returns 'www.example'
SELECT SUBSTRING_INDEX('www.example.com', '.', -1);  -- returns 'com'
SELECT SUBSTRING_INDEX('www.example.com', '.', -2);  -- returns 'example.com'

PostgreSQL: split_part(), string_to_array()

split_part(str, delim, index)

str: the full string to split
delim: the delimiter (e.g. '.')
index: the 1-based index of the part you want

Ex:

   SELECT split_part('www.example.com', '.', 1);  -- 'www'
   SELECT split_part('www.example.com', '.', 2);  -- 'example'
   SELECT split_part('www.example.com', '.', 3);  -- 'com'

string_to_array(str, delim)

str: the full string to split
delim: the delimiter (e.g. '.')

Ex:

   SELECT (string_to_array('www.example.com', '.'))[1];  -- 'www'
   SELECT (string_to_array('www.example.com', '.'))[2];  -- 'example'
   SELECT (string_to_array('www.example.com', '.'))[3];  -- 'com'

SQL - robbiehume/CS-Notes GitHub Wiki

Links

SQL software

Look into

General

Tips and tricks

Table data: INSERT, UPDATE, and DELETE

Inserting data: INSERT

Modifying data: UPDATE

Deleting data: DELETE

Joins (link)

Types of keys:

General Notes

Built-in Utility Functions

Comparison Table

Can combine multiple functions

COALESCE(col_name, 'default value')

LENGTH(col_name)

ROUND(col_name, <decimal_places>)

Operators

BETWEEN:

Aggregate functions

GROUP BY:

ORDER BY:

LIMIT (and offsets):

LIKE:

Regex:

CASE:

COUNT():

SUM:

UNION: combines the results of two or more SELECT queries into a single result set

IN:

Window Functions

Overview of components

Function Types

Window Function Clauses: OVER() and PARTITION BY

How to speed up (optimize) a query to improve performance:

Stored procedures

Database indexes:

Subqueries (AKA nested query)

Operators:

IN:

ALL:

ANY:

EXISTS:

Scalar subqueries:

Correlated subqueries:

Multirow subqueries:

CTEs (common table expressions) aka WITH

Subqueries vs CTEs

Subqueries vs JOINs

Syntax differences between MySQL (MariaDB), PostgreSQL, and SQLite

Complex Queries

Approach to Complex SQL Queries

pymysql

Common problem types

Gaps and islands

Top-N per Group

Running Totals / Cumulative Sums

Recursive / Hierarchical Queries

Pivot and Unpivot

Specific scenarios:

⚠️ **GitHub.com Fallback** ⚠️

Table data: `INSERT`, `UPDATE`, and `DELETE`

Inserting data: `INSERT`

Modifying data: `UPDATE`

Deleting data: `DELETE`

`COALESCE(col_name, 'default value')`

`LENGTH(col_name)`

`ROUND(col_name, <decimal_places>)`

`BETWEEN`:

`GROUP BY`:

`ORDER BY`:

`LIMIT` (and offsets):

`LIKE`:

`CASE`:

`COUNT()`:

`SUM`:

`UNION`: combines the results of two or more SELECT queries into a single result set

`IN`:

Window Function Clauses: `OVER()` and `PARTITION BY`

`IN`:

`ALL`:

`ANY`:

`EXISTS`:

⚠️ GitHub.com Fallback ⚠️