SQL Cheatsheet for Data Science - ElleCoding/Data_Science_Cheatsheets GitHub Wiki
Hi, I'm Elle (ellecoding). Here's a SQL cheatsheet for Data Science and Machine Learning I've made. Hope it helps!
- Databases: Collections of structured data organized in tables.
- Tables: Structures that hold data in rows and columns.
- Schemas: Logical containers for database objects like tables and views.
-
SELECT: Use the
SELECT
statement to fetch data from one or more tables.sqlCopy code SELECT * FROM employees; -- Selects all columns from the employees table
-
WHERE: Use the
WHERE
clause to specify conditions for filtering records.sqlCopy code SELECT * FROM employees WHERE salary > 50000; -- Selects records where salary is greater than 50000
-
ORDER BY: Use the
ORDER BY
clause to sort the results in ascending or descending order.sqlCopy code SELECT * FROM employees ORDER BY last_name ASC; -- Selects all columns from employees and sorts by last_name
-
INSERT INTO: Use the
INSERT INTO
statement to add new rows to a table.sqlCopy code INSERT INTO employees (first_name, last_name, salary) VALUES ('John', 'Doe', 60000); -- Inserts a new record into the employees table
-
UPDATE: Use the
UPDATE
statement to modify existing rows in a table.sqlCopy code UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1; -- Updates the salary of employees in department 1
-
DELETE: Use the
DELETE
statement to remove rows from a table.sqlCopy code DELETE FROM employees WHERE salary < 30000; -- Deletes records of employees with salary less than 30000
-
GROUP BY: Use the
GROUP BY
clause to aggregate data based on one or more columns.sqlCopy code SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id; -- Groups employees by department and calculates the average salary
-
HAVING: Use the
HAVING
clause to filter groups created by theGROUP BY
clause.sqlCopy code SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id HAVING AVG(salary) > 50000; -- Selects departments with an average salary greater than 50000
-
INNER JOIN: Use
INNER JOIN
to combine rows from two tables where there is a match.sqlCopy code SELECT e.first_name, e.last_name, d.department_name FROM employees e INNER JOIN departments d ON e.department_id = d.department_id; -- Joins employees and departments on department_id
-
LEFT JOIN: Use
LEFT JOIN
to include all rows from the left table and matched rows from the right table.sqlCopy code SELECT e.first_name, e.last_name, d.department_name FROM employees e LEFT JOIN departments d ON e.department_id = d.department_id; -- Selects all employees and their departments, if available
-
RIGHT JOIN: Use
RIGHT JOIN
to include all rows from the right table and matched rows from the left table.sqlCopy code SELECT e.first_name, e.last_name, d.department_name FROM employees e RIGHT JOIN departments d ON e.department_id = d.department_id; -- Selects all departments and their employees, if available
-
FULL OUTER JOIN: Use
FULL OUTER JOIN
to include all rows when there is a match in either table.sqlCopy code SELECT e.first_name, e.last_name, d.department_name FROM employees e FULL OUTER JOIN departments d ON e.department_id = d.department_id; -- Selects all employees and departments, with matches where possible
-
Subquery in SELECT: A subquery is a query nested inside another query to provide intermediate results.
sqlCopy code SELECT e.first_name, e.last_name, (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS avg_dept_salary FROM employees e; -- Selects employees and their average department salary
-
Subquery in WHERE: Use subqueries in the
WHERE
clause to filter based on another query's result.sqlCopy code SELECT first_name, last_name FROM employees WHERE department_id IN (SELECT department_id FROM employees GROUP BY department_id HAVING COUNT(*) > 5); -- Selects employees in departments with more than 5 employees
-
OVER(): Use window functions to perform calculations across a specified range of rows.
sqlCopy code SELECT first_name, last_name, salary, RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS dept_rank FROM employees; -- Selects employees and calculates their rank by salary within their department
-
WITH: Use CTEs to simplify complex queries by breaking them into simpler parts.
sqlCopy code WITH avg_salary AS ( SELECT department_id, AVG(salary) AS avg_dept_salary FROM employees GROUP BY department_id ) SELECT e.first_name, e.last_name, e.salary FROM employees e JOIN avg_salary a ON e.department_id = a.department_id WHERE e.salary > a.avg_dept_salary; -- Defines a CTE to calculate the average salary and then selects employees with a higher salary
-
CASE: Use the
CASE
statement to create conditional logic in SQL queries.sqlCopy code SELECT first_name, last_name, salary, CASE WHEN salary < 50000 THEN 'Low' WHEN salary BETWEEN 50000 AND 100000 THEN 'Medium' ELSE 'High' END AS salary_category FROM employees; -- Categorizes employees based on their salary
-
CREATE INDEX: Use indexes to speed up the retrieval of records from a table.
sqlCopy code CREATE INDEX idx_last_name ON employees(last_name); -- Creates an index on the last_name column of the employees table
-
BEGIN, COMMIT, ROLLBACK: Use transactions to group a set of operations into a single unit of work.
sqlCopy code BEGIN; UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1; COMMIT; -- Use ROLLBACK; to undo changes -- Demonstrates a simple transaction
-
CREATE VIEW: Use views to simplify complex queries and present data in a specific format.
sqlCopy code CREATE VIEW high_salary_employees AS SELECT first_name, last_name, salary FROM employees WHERE salary > 100000; -- Creates a view for employees with high salaries
-
Stored Procedure: Use stored procedures to encapsulate and reuse SQL code.
sqlCopy code CREATE PROCEDURE UpdateSalaries(IN department INT, IN increase DECIMAL) BEGIN UPDATE employees SET salary = salary + increase WHERE department_id = department; END; -- Creates a stored procedure to update employee salaries
-
User-Defined Function: Use user-defined functions to perform repetitive tasks and calculations.
sqlCopy code CREATE FUNCTION YearlySalary(monthly_salary DECIMAL) RETURNS DECIMAL BEGIN RETURN monthly_salary * 12; END; -- Creates a function to calculate the yearly salary
- Normalization: Organize data to reduce redundancy.
- Denormalization: Optimize read performance by adding redundancy.
- Query Optimization: Use EXPLAIN to analyze and optimize queries.
- Security: Use parameterized queries to prevent SQL injection.
-
Self Join: Join a table to itself to compare rows within the same table.
sqlCopy code SELECT e1.first_name AS emp1, e2.first_name AS emp2 FROM employees e1 JOIN employees e2 ON e1.manager_id = e2.employee_id; -- Joins the employees table to itself to find employees and their managers
-
Cross Join: Combine all rows from two or more tables, resulting in the Cartesian product.
sqlCopy code SELECT e.first_name, d.department_name FROM employees e CROSS JOIN departments d; -- Selects all combinations of employees and departments
-
Correlated Subquery: A subquery that references columns from the outer query.
sqlCopy code SELECT e.first_name, e.last_name, e.salary FROM employees e WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id); -- Selects employees whose salary is above the department average
-
ROW_NUMBER(): Assign a unique sequential integer to rows within a partition.
sqlCopy code SELECT first_name, last_name, salary, ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) AS row_num FROM employees; -- Assigns a unique row number to each employee within their department based on salary
-
RANK() and DENSE_RANK(): Calculate the rank of a row within a partition, with gaps for ties (RANK) or without gaps (DENSE_RANK).
sqlCopy code SELECT first_name, last_name, salary, RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS rank, DENSE_RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) AS dense_rank FROM employees; -- Calculates the rank and dense rank of employees within their department based on salary
-
LEAD() and LAG(): Access data from the following or preceding row within the same result set.
sqlCopy code SELECT first_name, last_name, salary, LAG(salary, 1) OVER(PARTITION BY department_id ORDER BY salary) AS prev_salary, LEAD(salary, 1) OVER(PARTITION BY department_id ORDER BY salary) AS next_salary FROM employees; -- Accesses the previous and next salary within the same department
-
MERGE: Insert, update, or delete rows in a table based on a condition.
sqlCopy code MERGE INTO employees AS target USING (SELECT employee_id, salary FROM new_salaries) AS source ON target.employee_id = source.employee_id WHEN MATCHED THEN UPDATE SET target.salary = source.salary WHEN NOT MATCHED THEN INSERT (employee_id, salary) VALUES (source.employee_id, source.salary); -- Merges new salary data into the employees table
-
WITH RECURSIVE: Create recursive queries to handle hierarchical data.
sqlCopy code WITH RECURSIVE employee_hierarchy AS ( SELECT employee_id, manager_id, first_name, last_name, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.employee_id, e.manager_id, e.first_name, e.last_name, eh.level + 1 FROM employees e JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id ) SELECT * FROM employee_hierarchy; -- Generates a hierarchy of employees and their levels
-
GROUPING SETS: Generate multiple groupings in a single query.
sqlCopy code SELECT department_id, manager_id, COUNT(*) AS employee_count FROM employees GROUP BY GROUPING SETS ((department_id), (manager_id), (department_id, manager_id)); -- Generates counts for different groupings
-
ROLLUP: Create subtotals that roll up from the most detailed level to a grand total.
sqlCopy code SELECT department_id, manager_id, COUNT(*) AS employee_count FROM employees GROUP BY ROLLUP (department_id, manager_id); -- Generates subtotals for departments and managers
-
CUBE: Generate subtotals for all combinations of groupings.
sqlCopy code SELECT department_id, manager_id, COUNT(*) AS employee_count FROM employees GROUP BY CUBE (department_id, manager_id); -- Generates subtotals for all combinations of departments and managers
-
Indexes: Create indexes to improve query performance.
sqlCopy code CREATE INDEX idx_salary ON employees(salary); -- Creates an index on the salary column of the employees table
-
EXPLAIN: Analyze and optimize query performance.
sqlCopy code EXPLAIN SELECT * FROM employees WHERE salary > 50000; -- Provides information about how the query will be executed
-
Partitioning: Divide a table into smaller, more manageable pieces.
sqlCopy code CREATE TABLE employees_part ( employee_id INT, first_name VARCHAR(50), last_name VARCHAR(50), salary DECIMAL(10, 2), department_id INT ) PARTITION BY RANGE (salary) ( PARTITION low_salaries VALUES LESS THAN (50000), PARTITION mid_salaries VALUES BETWEEN 50000 AND 100000, PARTITION high_salaries VALUES GREATER THAN (100000) ); -- Creates a partitioned table based on salary ranges
-
Savepoints: Set points within a transaction to which you can roll back.
sqlCopy code BEGIN; UPDATE employees SET salary = salary * 1.10 WHERE department_id = 1; SAVEPOINT sp1; UPDATE employees SET salary = salary * 1.05 WHERE department_id = 2; ROLLBACK TO sp1; -- Rolls back to the savepoint sp1 COMMIT; -- Commits the transaction
-
Materialized Views: Create a view that stores the result set physically.
sqlCopy code CREATE MATERIALIZED VIEW high_salary_employees AS SELECT first_name, last_name, salary FROM employees WHERE salary > 100000 WITH DATA; -- Creates a materialized view for employees with high salaries
-
Updating Materialized Views: Refresh the data in a materialized view.
sqlCopy code REFRESH MATERIALIZED VIEW high_salary_employees; -- Refreshes the materialized view to update the data
-
User-Defined Types: Create custom data types.
sqlCopy code CREATE TYPE address AS ( street VARCHAR(50), city VARCHAR(50), state CHAR(2), zip CHAR(5) ); -- Creates a custom data type for addresses
-
Advanced User-Defined Functions: Create more complex functions with multiple parameters and logic.
sqlCopy code CREATE FUNCTION EmployeeBonus(salary DECIMAL, performance_rating DECIMAL) RETURNS DECIMAL BEGIN DECLARE bonus DECIMAL; IF performance_rating >= 4 THEN SET bonus = salary * 0.20; ELSE SET bonus = salary * 0.10; END IF; RETURN bonus; END; -- Creates a function to calculate an employee's bonus based on salary and performance rating
-
Roles and Privileges: Manage user access and permissions.
sqlCopy code CREATE ROLE manager; GRANT SELECT, INSERT, UPDATE ON employees TO manager; -- Creates a role and grants privileges to it
-
Row-Level Security: Implement row-level security policies.
sqlCopy code CREATE POLICY department_policy ON employees USING (department_id = current_setting('myapp.current_department')::INT); -- Creates a policy to restrict access to rows based on the current department
-
JSON Functions: Store and query JSON data.
sqlCopy code CREATE TABLE employees_json ( employee_id INT PRIMARY KEY, employee_data JSON ); INSERT INTO employees_json VALUES (1, '{"first_name": "John", "last_name": "Doe", "salary": 60000}'); SELECT employee_data->>'first_name' AS first_name FROM employees_json WHERE employee_data->>'salary' > '50000'; -- Selects the first name from JSON data where salary is greater than 50000
-
XML Functions: Store and query XML data.
sqlCopy code CREATE TABLE employees_xml ( employee_id INT PRIMARY KEY, employee_data XML ); INSERT INTO employees_xml VALUES (1, '<employee><first_name>John</first_name><last_name>Doe</last_name><salary>60000</salary></employee>'); SELECT employee_data.query('employee/first_name/text()') AS first_name FROM employees_xml WHERE employee_data.value('(/employee/salary)[1]', 'INT') > 50000; -- Selects the first name from XML data where salary is greater than 50000
-
Temporal Tables: Manage time-sensitive data changes.
sqlCopy code CREATE TABLE employees_temporal ( employee_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), salary DECIMAL(10, 2), department_id INT, PERIOD FOR SYSTEM_TIME ) WITH (SYSTEM_VERSIONING = ON); -- Creates a temporal table to track historical changes