SQL - robbiehume/CS-Notes GitHub Wiki

Links

SQL software

  • Squirrel, toad

Tips and tricks

  • Return the number of desired results + 1 so you know if there's more to get on the next page or not
  • The JOIN operator lets you combine related information from multiple tables into a new table
  • Inner join vs outer join
    • Inner join: JOIN; keeps only the rows from both tables that's related to each other (in the resulting table)
    • Outer joins: will also keep rows that are not related to the other table and missing data will be filled with NULL
      • Left (outer) join: LEFT JOIN; keeps the unrelated data from the left (the first) table
        • Keeps any rows from the first table that don't have a match with the second table
      • Right (outer) join: RIGHT JOIN; keeps the unrelated data from the right (the second) table
        • Keeps any rows from the second table that don't have a match with the first table
      • Full (outer) join: FULL JOIN; keeps all rows from both tables
    • The ON clause is similar to WHERE for SELECT
    • Syntax:
      •    SELECT pets.name AS pet_name, owners.name AS owner
           FROM pets
           JOIN owners
           ON pets.owner_id = owners.id;  
    • Should use table aliases if joining a table with itself
    • Can also do joins by having multiple tables in the FROM statement

Table alisases:

  •    SELECT t.id
       FROM table_name t    // or FROM table_name AS t

Key types:

  • NOTE: only 3 types of keys are actually used in a database (primary, unique, and foreign). The rest are only concepts of RDBMS
  • Super key: a set of one or more than one key that can be used to identify a record uniquely in a table
    • Primary key, unique key, and alternate key are a subset of super keys
    • A super key can contain multiple attributes that might not be able to identify tuples in a table independently, but when grouped with certain keys, they can identify tuples uniquely
  • Candidate key: a set of one or more fields/columns that can identify a record uniquely in a table
    • There can be multiple candidate keys in one table and each candidate key can work as a primary key
  • Primary key: a set of one or more fields/columns of a table that can uniquely identify a record in a table
    • There is only one chosen primary key
  • Alternate key: a candidate key that currently is not a primary key
  • Composite key: a combination of more than one field/column of a table
  • Unique key: a set of one or more fields/columns of a tale that uniquely identify a record in a table
    • It's like a primary key, but it can accept only one NULLvalue and it can not have duplicate values, while PK doesn't allow any NULL or duplicate values
  • Foreign key: a field/column in the table that is the primary key in another table

General Notes

  • explain: In MySQL / MariaDB (different keyword in others?), you can add "explain" to the beginning of the query to see each step it takes
    • If any of the query steps say "Using full table", then it's potentially making your query slower
  • \G: in MySQL / MariaDB can add \G to the end before (or instead of) the semicolon to display a better row format for the results

Operators

GROUP BY:

  • It's used with aggregate functions and used in collaboration with the SELECT statement to arrange identical data into groups
  • Can use column numbers (1, 2, etc.) instead of names. The numbers correspond to the order of the columns in the select statement
    • This is especially beneficial when a column in the select is an expression
  • When doing a COUNT() on a query with a GROUP BY, it will give a count for each of the groups

ORDER BY:

  • You don't need to SELECT the column you're applying the ORDER BY to
  • Can order by multiple columns (will order by the first, and use the second column to order any rows that have the same first column value)

LIKE:

  • On some RDBMS, LIKE is case-sensitive; on others it's not
  • Wildcard:
    • _: exactly one character
    • %: any number of characters

CASE: *

COUNT():

  • By default, COUNT will include duplicate values in the count. It essentially counts all rows for which there is a non-null value in the column
  • If only wanting to count the unique values in a column you can do COUNT(DISTINCT <col_name>)
  • When doing a COUNT() on a query with a GROUP BY, it will give a count for each of the groups

UNION:

IN:

  • If you want to get rows that have certain values you can use IN instead of multiple = statements
  • Ex: WHERE color = 'red' AND color = 'blue' --> WHERE color IN ('red', 'blue')

EXISTS:

  • If the subquery returns rows, then the result from the outer query is added to the result set
    • If it returns NULL, then it's not added (skipped)
  • Can use the NOT operator to inverse the EXISTS clause

ANY

  • True if the comparison is true for ANY of the values of the subquery

ALL

  • True if the comparison is true for ALL of the values of the subquery

Database indexes:

  • https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-get-started-with-sql/modules/analyze-data-sql-learn-manipulation-c4b/articles/sql-indexes
  • Indexes help speed up querying by providing a method to quickly lookup requested data
  • Simply put, an index is a pointer to data in a table. It's very similar to a index in the back of a book
  • Indexes serve as lookup tables that efficiently store data for quicker retrieval
  • A table can have multiple indexes
  • Updating a table that has indexes takes more time than updating one without (b/c the indexes also need an update)
    • So only create indexes on columns that will be frequently searched against
  • Ex: getting records in the past 24 hours
    • By indexing a timestamp column you could look at the timestamp and once there's one that is > 24 hours ago, you know you can stop looking
    • Without indexing, you would have to look through ALL the records to check each timestamp and thus greatly increasing the time complexity

Subqueries (AKA nested query)

  • https://learnsql.com/blog/sql-subquery-types/
  • https://learnsql.com/blog/sql-subquery-examples/
  • https://learnsql.com/blog/sql-subqueries/
  • A subquery is a query placed within another SQL query
    • They can be included in the WHERE, FROM, or SELECT clauses of the main query
  • Can use them with the ANY or ALL keywords if the subquery can return multiple rows
  • Can also use them with the IN operator
  • Scalar subqueries: return a single value, or exactly one row and exactly one column
    • Ex:
          SELECT name, listed_price
          FROM paintings
          WHERE listed_price > (
              SELECT AVG(listed_price)
              FROM paintings
          );
  • Multirow subqueries: return either one column with multiple rows (i.e. a list of values) or multiple columns with multiple rows (i.e. tables)
  • Correlated subqueries: where the inner query relies on information from the outer query
    • They refer to the table from the outer query
    • Each subquery is processed one-by-one for each value in the outer query
    • You can even use them in the SELECT statement of the outer query
      • SELECT name, (SELECT AVG(age) FROM cats c2 WHERE c2.name = c1.name) FROM cats c1
  • Ex of subquery in FROM (find most number of cats in one breed)
    • SELECT MAX(number_of_cats) FROM (SELECT breed, COUNT(*) AS number_of_cats FROM cat GROUP BY breed) breed_count // need this subquery table alias

CTEs (common table expressions)

  • A CTE, also referred to as a WITH clause, is a temporary named result set that you can reference anywhere in your query
  • If possible, CTEs are better to use than subqueries
  • Recursive CTE

Subqueries vs CTEs

Subqueries vs JOINs

Syntax differences between MySQL (MariaDB), PostgreSQL, and SQLite

  • MySQL: not case-sensitive, can use either "" or ''
  • PostgreSQL: case-sensitive, can only use ''
  • SQLite: case-sensitive, ca use either "" or ''

Complex Queries

  • Start with the data model: Which tables store which data and the nature of relations b/w these tables
  • First step is to determine which tables will be used, determined by: All tables containing data needed to display the result If any of these tables are not directly related, also include all tables between them and figure out how to join them correctly
  • Start with smaller/simpler queries and put them together, checking the results along the way
  • 0.1. look at the question, what will I need?
    • What tables?
    • GROUP BY?
    • Subquery?
    • Outer joins?
    • Big expressions?
  • 0.2. get some working code
  • Get the joins right
    • Add one join at a time and test/verify along the way
  • Write any expressions, and verify them
    • For verification, do them in a select clause even if you may use it elsewhere
  • Write any where clause expressions, and verify them
  • Arrange the columns and rows to visualize how the grouping will take place
  • Add the grouping, visually verify
  • Add the having clause
  • Add the order by
  • Tidy up the code

Scenarios:

  • Find all rows that have the max value of a column
    •    SELECT *
         FROM table_name
         WHERE col_name = (SELECT MAX(col_name) FROM table_name;

pymysql

  • Can setup a connection and then get the cursor and do:
    • `cursor.execute('sql with %(substitution)s', {'substitution': 'sub'})
  • In order to build a string with substitutions
def insert_keywords(cursor, r_id, keywords):
    keyword_list = ''
    sql = "INSERT INTO phrases (p_id, phrase_text) SELECT uuid(), kw.txt FROM phrases p RIGHT OUTER JOIN ("
    sql_dict = {}
    for keyword in keywords.split(','): 
        kw_trimmed = ' '.join(keyword.strip().split())
        if len(kw_trimmed) > 255 or not kw_trimmed:
            continue
        keyword_list += f'%({kw_trimmed})s, '
        sql += f'(SELECT %({kw_trimmed})s AS txt) union '
        sql_dict[f'{kw_trimmed}'] = kw_trimmed
    sql = sql[0:-7]
    sql += ") kw ON lower(p.phrase_text) = lower(kw.txt) WHERE p.p_id is null"
    
    cursor.execute(sql, sql_dict)
    
    cursor.execute(f''' INSERT INTO r_phrases (r_id, p_id) 
                            SELECT %(r_id)s, p_id 
                            FROM phrases WHERE phrase_text IN ({keyword_list[:-2]}) '''
                            , { 'r_id': r_id, **sql_dict})
⚠️ **GitHub.com Fallback** ⚠️