The Universal View: All Joins as Filtered Cross Joins - maxxts7/Techbook GitHub Wiki

The Universal View: All Joins as Filtered Cross Joins

After exploring various types of joins, including inner joins, non-equi joins, and self joins, we can now take a step back and look at joins from a more universal perspective. In essence, all of these joins can be conceptualized as filtered cross joins.

What is a Cross Join?

A cross join, also known as a Cartesian product, combines each row from the first table with every row from the second table. If table A has n rows and table B has m rows, the result of a cross join will have n * m rows.

For our Employees table with 5 rows, a cross join with itself would look like this:

SELECT e1.name AS employee1, e1.salary AS salary1, 
       e2.name AS employee2, e2.salary AS salary2
FROM Employees e1
CROSS JOIN Employees e2;

This would result in 25 rows (5 * 5), pairing each employee with every employee (including themselves).

Joins as Filtered Cross Joins

Now, let's revisit our previous examples and see how they relate to cross joins:

  1. Inner Join: Our initial inner join between Employees and Departments can be seen as a cross join where we filter out all rows where the department IDs don't match.

  2. Non-Equi Join: The salary grade example is a cross join between Employees and Salary_Grades, filtered to keep only the rows where the employee's salary falls within the grade's range.

  3. Self Join for Pairs: Our last example of finding employee pairs based on salary differences is particularly illustrative. It's essentially a cross join of the Employees table with itself, with the following filters applied:

    • Keep only rows where salary1 > salary2
    • Keep only rows where salary1 <= salary2 + 10000
    • Remove rows where employee1 is the same as employee2

Let's make this explicit by rewriting our last query as a filtered cross join:

SELECT e1.name AS employee1, e1.salary AS salary1, 
       e2.name AS employee2, e2.salary AS salary2
FROM Employees e1
CROSS JOIN Employees e2
WHERE e1.salary > e2.salary 
  AND e1.salary <= e2.salary + 10000
  AND e1.id != e2.id
ORDER BY e1.salary DESC, e2.salary DESC;

This query will produce the exact same result as our previous self-join example. The only difference is that it explicitly uses a CROSS JOIN and then applies the filtering conditions in the WHERE clause, rather than incorporating them into the JOIN condition.

Key points about this perspective:

  1. Every join can be conceptualized as a filtered cross join.
  2. The JOIN conditions (whether equality-based or not) act as filters on the cross join result.
  3. This view can help in understanding complex joins by breaking them down into two steps: generate all possible combinations, then filter to keep only the desired rows.
  4. While conceptually useful, databases typically don't actually perform a full cross join and then filter. Query optimizers use more efficient methods to achieve the same result.

Understanding joins as filtered cross joins provides a unifying concept that can make complex join operations more intuitive. It also highlights the power of SQL in allowing us to express complex data relationships through a combination of generating all possibilities (cross join) and then applying specific criteria (filtering).

Beyond Inner Joins: Left, Right, and Full Outer Joins

So far, we've focused on inner joins and their variations. Now, let's explore other types of joins that handle unmatched rows differently: left joins, right joins, and full outer joins.

Let's use our Employees and Departments tables again, but with a slight modification to the Departments table:

Employees
id name department_id
1 Alice 1
2 Bob 2
3 Charlie 1
4 Diana 3
5 Eva 2
6 Frank 4
Departments
id name
1 HR
2 IT
3 Finance
5 Marketing

Left Join

A left join returns all rows from the left table (the first table mentioned in the query), and the matched rows from the right table. If there's no match, the result will contain NULL for columns from the right table.

SELECT e.name AS employee, d.name AS department
FROM Employees e
LEFT JOIN Departments d ON e.department_id = d.id;

Result:

employee department
Alice HR
Bob IT
Charlie HR
Diana Finance
Eva IT
Frank NULL

Notice that Frank appears in the result with a NULL department, unlike in an inner join.

Right Join

A right join is similar to a left join, but returns all rows from the right table, and the matched rows from the left table. If there's no match, the result will contain NULL for columns from the left table.

SELECT e.name AS employee, d.name AS department
FROM Employees e
RIGHT JOIN Departments d ON e.department_id = d.id;

Result:

employee department
Alice HR
Charlie HR
Bob IT
Eva IT
Diana Finance
NULL Marketing

Here, Marketing appears in the result with a NULL employee.

Full Outer Join

A full outer join returns all rows when there is a match in either the left or right table. If there's no match, it will still include the row, filling in NULL for columns from the table without a match.

SELECT e.name AS employee, d.name AS department
FROM Employees e
FULL OUTER JOIN Departments d ON e.department_id = d.id;

Result:

employee department
Alice HR
Charlie HR
Bob IT
Eva IT
Diana Finance
Frank NULL
NULL Marketing

This result includes both Frank (who has no department) and Marketing (which has no employees).

Key points about these joins:

  1. Left and right joins are asymmetrical - the order of tables matters.
  2. A left join will always return at least all rows from the left table, a right join from the right table.
  3. Full outer joins are symmetrical - the order of tables doesn't affect which rows are returned.
  4. These joins are particularly useful when you need to see data that doesn't have a match, such as employees without departments or departments without employees.
  5. Like inner joins, these can also be conceptualized as filtered cross joins, but instead of eliminating unmatched rows, they fill them with NULLs.

Understanding these join types allows for more flexible querying, especially when dealing with datasets where not all records have matching entries in related tables.