The Universal View: All Joins as Filtered Cross Joins - maxxts7/Techbook GitHub Wiki
The Universal View: All Joins as Filtered Cross Joins
After exploring various types of joins, including inner joins, non-equi joins, and self joins, we can now take a step back and look at joins from a more universal perspective. In essence, all of these joins can be conceptualized as filtered cross joins.
What is a Cross Join?
A cross join, also known as a Cartesian product, combines each row from the first table with every row from the second table. If table A has n rows and table B has m rows, the result of a cross join will have n * m rows.
For our Employees table with 5 rows, a cross join with itself would look like this:
SELECT e1.name AS employee1, e1.salary AS salary1,
e2.name AS employee2, e2.salary AS salary2
FROM Employees e1
CROSS JOIN Employees e2;
This would result in 25 rows (5 * 5), pairing each employee with every employee (including themselves).
Joins as Filtered Cross Joins
Now, let's revisit our previous examples and see how they relate to cross joins:
-
Inner Join: Our initial inner join between Employees and Departments can be seen as a cross join where we filter out all rows where the department IDs don't match.
-
Non-Equi Join: The salary grade example is a cross join between Employees and Salary_Grades, filtered to keep only the rows where the employee's salary falls within the grade's range.
-
Self Join for Pairs: Our last example of finding employee pairs based on salary differences is particularly illustrative. It's essentially a cross join of the Employees table with itself, with the following filters applied:
- Keep only rows where salary1 > salary2
- Keep only rows where salary1 <= salary2 + 10000
- Remove rows where employee1 is the same as employee2
Let's make this explicit by rewriting our last query as a filtered cross join:
SELECT e1.name AS employee1, e1.salary AS salary1,
e2.name AS employee2, e2.salary AS salary2
FROM Employees e1
CROSS JOIN Employees e2
WHERE e1.salary > e2.salary
AND e1.salary <= e2.salary + 10000
AND e1.id != e2.id
ORDER BY e1.salary DESC, e2.salary DESC;
This query will produce the exact same result as our previous self-join example. The only difference is that it explicitly uses a CROSS JOIN and then applies the filtering conditions in the WHERE clause, rather than incorporating them into the JOIN condition.
Key points about this perspective:
- Every join can be conceptualized as a filtered cross join.
- The JOIN conditions (whether equality-based or not) act as filters on the cross join result.
- This view can help in understanding complex joins by breaking them down into two steps: generate all possible combinations, then filter to keep only the desired rows.
- While conceptually useful, databases typically don't actually perform a full cross join and then filter. Query optimizers use more efficient methods to achieve the same result.
Understanding joins as filtered cross joins provides a unifying concept that can make complex join operations more intuitive. It also highlights the power of SQL in allowing us to express complex data relationships through a combination of generating all possibilities (cross join) and then applying specific criteria (filtering).
Beyond Inner Joins: Left, Right, and Full Outer Joins
So far, we've focused on inner joins and their variations. Now, let's explore other types of joins that handle unmatched rows differently: left joins, right joins, and full outer joins.
Let's use our Employees and Departments tables again, but with a slight modification to the Departments table:
Employees | ||
---|---|---|
id | name | department_id |
1 | Alice | 1 |
2 | Bob | 2 |
3 | Charlie | 1 |
4 | Diana | 3 |
5 | Eva | 2 |
6 | Frank | 4 |
Departments | |
---|---|
id | name |
1 | HR |
2 | IT |
3 | Finance |
5 | Marketing |
Left Join
A left join returns all rows from the left table (the first table mentioned in the query), and the matched rows from the right table. If there's no match, the result will contain NULL for columns from the right table.
SELECT e.name AS employee, d.name AS department
FROM Employees e
LEFT JOIN Departments d ON e.department_id = d.id;
Result:
employee | department |
---|---|
Alice | HR |
Bob | IT |
Charlie | HR |
Diana | Finance |
Eva | IT |
Frank | NULL |
Notice that Frank appears in the result with a NULL department, unlike in an inner join.
Right Join
A right join is similar to a left join, but returns all rows from the right table, and the matched rows from the left table. If there's no match, the result will contain NULL for columns from the left table.
SELECT e.name AS employee, d.name AS department
FROM Employees e
RIGHT JOIN Departments d ON e.department_id = d.id;
Result:
employee | department |
---|---|
Alice | HR |
Charlie | HR |
Bob | IT |
Eva | IT |
Diana | Finance |
NULL | Marketing |
Here, Marketing appears in the result with a NULL employee.
Full Outer Join
A full outer join returns all rows when there is a match in either the left or right table. If there's no match, it will still include the row, filling in NULL for columns from the table without a match.
SELECT e.name AS employee, d.name AS department
FROM Employees e
FULL OUTER JOIN Departments d ON e.department_id = d.id;
Result:
employee | department |
---|---|
Alice | HR |
Charlie | HR |
Bob | IT |
Eva | IT |
Diana | Finance |
Frank | NULL |
NULL | Marketing |
This result includes both Frank (who has no department) and Marketing (which has no employees).
Key points about these joins:
- Left and right joins are asymmetrical - the order of tables matters.
- A left join will always return at least all rows from the left table, a right join from the right table.
- Full outer joins are symmetrical - the order of tables doesn't affect which rows are returned.
- These joins are particularly useful when you need to see data that doesn't have a match, such as employees without departments or departments without employees.
- Like inner joins, these can also be conceptualized as filtered cross joins, but instead of eliminating unmatched rows, they fill them with NULLs.
Understanding these join types allows for more flexible querying, especially when dealing with datasets where not all records have matching entries in related tables.