SQL Cheatsheet for Data Science - ElleCoding/Data_Science_Cheatsheets GitHub Wiki

SQL Cheatsheet for Data Science

Hi, I'm Elle (ellecoding). Here's a SQL cheatsheet for Data Science and Machine Learning I've made. Hope it helps!

Basic Concepts

Database Basics

  • Databases: Collections of structured data organized in tables.
  • Tables: Structures that hold data in rows and columns.
  • Schemas: Logical containers for database objects like tables and views.

Basic SQL Commands

  • SELECT: Use the SELECT statement to fetch data from one or more tables.

    sqlCopy code
    SELECT * FROM employees;  -- Selects all columns from the employees table
    
  • WHERE: Use the WHERE clause to specify conditions for filtering records.

    sqlCopy code
    SELECT * FROM employees WHERE salary > 50000;  -- Selects records where salary is greater than 50000
    
  • ORDER BY: Use the ORDER BY clause to sort the results in ascending or descending order.

    sqlCopy code
    SELECT * FROM employees ORDER BY last_name ASC;  -- Selects all columns from employees and sorts by last_name
    

Data Manipulation

  • INSERT INTO: Use the INSERT INTO statement to add new rows to a table.

    sqlCopy code
    INSERT INTO employees (first_name, last_name, salary)
    VALUES ('John', 'Doe', 60000);  -- Inserts a new record into the employees table
    
  • UPDATE: Use the UPDATE statement to modify existing rows in a table.

    sqlCopy code
    UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1;  -- Updates the salary of employees in department 1
    
  • DELETE: Use the DELETE statement to remove rows from a table.

    sqlCopy code
    DELETE FROM employees WHERE salary < 30000;  -- Deletes records of employees with salary less than 30000
    

Advanced SQL Commands

Aggregation

  • GROUP BY: Use the GROUP BY clause to aggregate data based on one or more columns.

    sqlCopy code
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id;  -- Groups employees by department and calculates the average salary
    
  • HAVING: Use the HAVING clause to filter groups created by the GROUP BY clause.

    sqlCopy code
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
    HAVING AVG(salary) > 50000;  -- Selects departments with an average salary greater than 50000
    

Joins

  • INNER JOIN: Use INNER JOIN to combine rows from two tables where there is a match.

    sqlCopy code
    SELECT e.first_name, e.last_name, d.department_name
    FROM employees e
    INNER JOIN departments d ON e.department_id = d.department_id;  -- Joins employees and departments on department_id
    
  • LEFT JOIN: Use LEFT JOIN to include all rows from the left table and matched rows from the right table.

    sqlCopy code
    SELECT e.first_name, e.last_name, d.department_name
    FROM employees e
    LEFT JOIN departments d ON e.department_id = d.department_id;  -- Selects all employees and their departments, if available
    
  • RIGHT JOIN: Use RIGHT JOIN to include all rows from the right table and matched rows from the left table.

    sqlCopy code
    SELECT e.first_name, e.last_name, d.department_name
    FROM employees e
    RIGHT JOIN departments d ON e.department_id = d.department_id;  -- Selects all departments and their employees, if available
    
  • FULL OUTER JOIN: Use FULL OUTER JOIN to include all rows when there is a match in either table.

    sqlCopy code
    SELECT e.first_name, e.last_name, d.department_name
    FROM employees e
    FULL OUTER JOIN departments d ON e.department_id = d.department_id;  -- Selects all employees and departments, with matches where possible
    

Subqueries

  • Subquery in SELECT: A subquery is a query nested inside another query to provide intermediate results.

    sqlCopy code
    SELECT e.first_name, e.last_name,
           (SELECT AVG(salary)
            FROM employees
            WHERE department_id = e.department_id) AS avg_dept_salary
    FROM employees e;  -- Selects employees and their average department salary
    
  • Subquery in WHERE: Use subqueries in the WHERE clause to filter based on another query's result.

    sqlCopy code
    SELECT first_name, last_name
    FROM employees
    WHERE department_id IN (SELECT department_id
                            FROM employees
                            GROUP BY department_id
                            HAVING COUNT(*) > 5);  -- Selects employees in departments with more than 5 employees
    

Window Functions

  • OVER(): Use window functions to perform calculations across a specified range of rows.

    sqlCopy code
    SELECT first_name, last_name, salary,
           RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
    FROM employees;  -- Selects employees and calculates their rank by salary within their department
    

Common Table Expressions (CTE)

  • WITH: Use CTEs to simplify complex queries by breaking them into simpler parts.

    sqlCopy code
    WITH avg_salary AS (
      SELECT department_id, AVG(salary) AS avg_dept_salary
      FROM employees
      GROUP BY department_id
    )
    SELECT e.first_name, e.last_name, e.salary
    FROM employees e
    JOIN avg_salary a ON e.department_id = a.department_id
    WHERE e.salary > a.avg_dept_salary;  -- Defines a CTE to calculate the average salary and then selects employees with a higher salary
    

Advanced Data Manipulation

  • CASE: Use the CASE statement to create conditional logic in SQL queries.

    sqlCopy code
    SELECT first_name, last_name, salary,
           CASE
             WHEN salary < 50000 THEN 'Low'
             WHEN salary BETWEEN 50000 AND 100000 THEN 'Medium'
             ELSE 'High'
           END AS salary_category
    FROM employees;  -- Categorizes employees based on their salary
    

Indexes

  • CREATE INDEX: Use indexes to speed up the retrieval of records from a table.

    sqlCopy code
    CREATE INDEX idx_last_name ON employees(last_name);  -- Creates an index on the last_name column of the employees table
    

Transactions

  • BEGIN, COMMIT, ROLLBACK: Use transactions to group a set of operations into a single unit of work.

    sqlCopy code
    BEGIN;
    
    UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1;
    
    COMMIT;  -- Use ROLLBACK; to undo changes  -- Demonstrates a simple transaction
    

Views

  • CREATE VIEW: Use views to simplify complex queries and present data in a specific format.

    sqlCopy code
    CREATE VIEW high_salary_employees AS
    SELECT first_name, last_name, salary
    FROM employees
    WHERE salary > 100000;  -- Creates a view for employees with high salaries
    

Stored Procedures and Functions

  • Stored Procedure: Use stored procedures to encapsulate and reuse SQL code.

    sqlCopy code
    CREATE PROCEDURE UpdateSalaries(IN department INT, IN increase DECIMAL)
    BEGIN
      UPDATE employees
      SET salary = salary + increase
      WHERE department_id = department;
    END;  -- Creates a stored procedure to update employee salaries
    
  • User-Defined Function: Use user-defined functions to perform repetitive tasks and calculations.

    sqlCopy code
    CREATE FUNCTION YearlySalary(monthly_salary DECIMAL) RETURNS DECIMAL
    BEGIN
      RETURN monthly_salary * 12;
    END;  -- Creates a function to calculate the yearly salary
    

Helpful SQL Tips

  • Normalization: Organize data to reduce redundancy.
  • Denormalization: Optimize read performance by adding redundancy.
  • Query Optimization: Use EXPLAIN to analyze and optimize queries.
  • Security: Use parameterized queries to prevent SQL injection.

SQL Cheatsheet for Data Science - Part 2 (Intermediate/Advanced)

Intermediate SQL Commands

Advanced Joins

  • Self Join: Join a table to itself to compare rows within the same table.

    sqlCopy code
    SELECT e1.first_name AS emp1, e2.first_name AS emp2
    FROM employees e1
    JOIN employees e2 ON e1.manager_id = e2.employee_id;  -- Joins the employees table to itself to find employees and their managers
    
  • Cross Join: Combine all rows from two or more tables, resulting in the Cartesian product.

    sqlCopy code
    SELECT e.first_name, d.department_name
    FROM employees e
    CROSS JOIN departments d;  -- Selects all combinations of employees and departments
    

Complex Subqueries

  • Correlated Subquery: A subquery that references columns from the outer query.

    sqlCopy code
    SELECT e.first_name, e.last_name, e.salary
    FROM employees e
    WHERE e.salary > (SELECT AVG(salary)
                      FROM employees
                      WHERE department_id = e.department_id);  -- Selects employees whose salary is above the department average
    

Window Functions

  • ROW_NUMBER(): Assign a unique sequential integer to rows within a partition.

    sqlCopy code
    SELECT first_name, last_name, salary,
           ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) AS row_num
    FROM employees;  -- Assigns a unique row number to each employee within their department based on salary
    
  • RANK() and DENSE_RANK(): Calculate the rank of a row within a partition, with gaps for ties (RANK) or without gaps (DENSE_RANK).

    sqlCopy code
    SELECT first_name, last_name, salary,
           RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS rank,
           DENSE_RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
    FROM employees;  -- Calculates the rank and dense rank of employees within their department based on salary
    
  • LEAD() and LAG(): Access data from the following or preceding row within the same result set.

    sqlCopy code
    SELECT first_name, last_name, salary,
           LAG(salary, 1) OVER(PARTITION BY department_id ORDER BY salary) AS prev_salary,
           LEAD(salary, 1) OVER(PARTITION BY department_id ORDER BY salary) AS next_salary
    FROM employees;  -- Accesses the previous and next salary within the same department
    

Advanced Data Manipulation

  • MERGE: Insert, update, or delete rows in a table based on a condition.

    sqlCopy code
    MERGE INTO employees AS target
    USING (SELECT employee_id, salary FROM new_salaries) AS source
    ON target.employee_id = source.employee_id
    WHEN MATCHED THEN
      UPDATE SET target.salary = source.salary
    WHEN NOT MATCHED THEN
      INSERT (employee_id, salary) VALUES (source.employee_id, source.salary);  -- Merges new salary data into the employees table
    

Recursive Queries

  • WITH RECURSIVE: Create recursive queries to handle hierarchical data.

    sqlCopy code
    WITH RECURSIVE employee_hierarchy AS (
      SELECT employee_id, manager_id, first_name, last_name, 1 AS level
      FROM employees
      WHERE manager_id IS NULL
      UNION ALL
      SELECT e.employee_id, e.manager_id, e.first_name, e.last_name, eh.level + 1
      FROM employees e
      JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
    )
    SELECT * FROM employee_hierarchy;  -- Generates a hierarchy of employees and their levels
    

Advanced Aggregation

  • GROUPING SETS: Generate multiple groupings in a single query.

    sqlCopy code
    SELECT department_id, manager_id, COUNT(*) AS employee_count
    FROM employees
    GROUP BY GROUPING SETS ((department_id), (manager_id), (department_id, manager_id));  -- Generates counts for different groupings
    
  • ROLLUP: Create subtotals that roll up from the most detailed level to a grand total.

    sqlCopy code
    SELECT department_id, manager_id, COUNT(*) AS employee_count
    FROM employees
    GROUP BY ROLLUP (department_id, manager_id);  -- Generates subtotals for departments and managers
    
  • CUBE: Generate subtotals for all combinations of groupings.

    sqlCopy code
    SELECT department_id, manager_id, COUNT(*) AS employee_count
    FROM employees
    GROUP BY CUBE (department_id, manager_id);  -- Generates subtotals for all combinations of departments and managers
    

Performance Tuning

  • Indexes: Create indexes to improve query performance.

    sqlCopy code
    CREATE INDEX idx_salary ON employees(salary);  -- Creates an index on the salary column of the employees table
    
  • EXPLAIN: Analyze and optimize query performance.

    sqlCopy code
    EXPLAIN SELECT * FROM employees WHERE salary > 50000;  -- Provides information about how the query will be executed
    
  • Partitioning: Divide a table into smaller, more manageable pieces.

    sqlCopy code
    CREATE TABLE employees_part (
      employee_id INT,
      first_name VARCHAR(50),
      last_name VARCHAR(50),
      salary DECIMAL(10, 2),
      department_id INT
    ) PARTITION BY RANGE (salary) (
      PARTITION low_salaries VALUES LESS THAN (50000),
      PARTITION mid_salaries VALUES BETWEEN 50000 AND 100000,
      PARTITION high_salaries VALUES GREATER THAN (100000)
    );  -- Creates a partitioned table based on salary ranges
    

Advanced Transactions

  • Savepoints: Set points within a transaction to which you can roll back.

    sqlCopy code
    BEGIN;
    
    UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1;
    
    SAVEPOINT sp1;
    
    UPDATE employees SET salary = salary * 1.05 WHERE department_id = 2;
    
    ROLLBACK TO sp1;  -- Rolls back to the savepoint sp1
    
    COMMIT;  -- Commits the transaction
    

Advanced Views

  • Materialized Views: Create a view that stores the result set physically.

    sqlCopy code
    CREATE MATERIALIZED VIEW high_salary_employees AS
    SELECT first_name, last_name, salary
    FROM employees
    WHERE salary > 100000
    WITH DATA;  -- Creates a materialized view for employees with high salaries
    
  • Updating Materialized Views: Refresh the data in a materialized view.

    sqlCopy code
    REFRESH MATERIALIZED VIEW high_salary_employees;  -- Refreshes the materialized view to update the data
    

User-Defined Types and Functions

  • User-Defined Types: Create custom data types.

    sqlCopy code
    CREATE TYPE address AS (
      street VARCHAR(50),
      city VARCHAR(50),
      state CHAR(2),
      zip CHAR(5)
    );  -- Creates a custom data type for addresses
    
  • Advanced User-Defined Functions: Create more complex functions with multiple parameters and logic.

    sqlCopy code
    CREATE FUNCTION EmployeeBonus(salary DECIMAL, performance_rating DECIMAL) RETURNS DECIMAL
    BEGIN
      DECLARE bonus DECIMAL;
      IF performance_rating >= 4 THEN
        SET bonus = salary * 0.20;
      ELSE
        SET bonus = salary * 0.10;
      END IF;
      RETURN bonus;
    END;  -- Creates a function to calculate an employee's bonus based on salary and performance rating
    

Security and Access Control

  • Roles and Privileges: Manage user access and permissions.

    sqlCopy code
    CREATE ROLE manager;
    GRANT SELECT, INSERT, UPDATE ON employees TO manager;  -- Creates a role and grants privileges to it
    
  • Row-Level Security: Implement row-level security policies.

    sqlCopy code
    CREATE POLICY department_policy ON employees
    USING (department_id = current_setting('myapp.current_department')::INT);  -- Creates a policy to restrict access to rows based on the current department
    

JSON and XML Data Handling

  • JSON Functions: Store and query JSON data.

    sqlCopy code
    CREATE TABLE employees_json (
      employee_id INT PRIMARY KEY,
      employee_data JSON
    );
    
    INSERT INTO employees_json VALUES (1, '{"first_name": "John", "last_name": "Doe", "salary": 60000}');
    
    SELECT employee_data->>'first_name' AS first_name
    FROM employees_json
    WHERE employee_data->>'salary' > '50000';  -- Selects the first name from JSON data where salary is greater than 50000
    
  • XML Functions: Store and query XML data.

    sqlCopy code
    CREATE TABLE employees_xml (
      employee_id INT PRIMARY KEY,
      employee_data XML
    );
    
    INSERT INTO employees_xml VALUES (1, '<employee><first_name>John</first_name><last_name>Doe</last_name><salary>60000</salary></employee>');
    
    SELECT employee_data.query('employee/first_name/text()') AS first_name
    FROM employees_xml
    WHERE employee_data.value('(/employee/salary)[1]', 'INT') > 50000;  -- Selects the first name from XML data where salary is greater than 50000
    

Temporal Data Handling

  • Temporal Tables: Manage time-sensitive data changes.

    sqlCopy code
    CREATE TABLE employees_temporal (
      employee_id INT PRIMARY KEY,
      first_name VARCHAR(50),
      last_name VARCHAR(50),
      salary DECIMAL(10, 2),
      department_id INT,
      PERIOD FOR SYSTEM_TIME
    ) WITH (SYSTEM_VERSIONING = ON);  -- Creates a temporal table to track historical changes
    
⚠️ **GitHub.com Fallback** ⚠️