sql lateral join - ghdrako/doc_snipets GitHub Wiki
- https://jonmce.medium.com/what-the-heck-is-a-lateral-join-anyway-4c3345b94a63
- https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL
select *
from table1 t1
cross join lateral
(
select *
from t2
where t1.col1 = t2.col1 -- Only allowed because of lateral
) sub
A lateral join is a type of join in SQL that allows you to join a table with a subquery, where the subquery is run for each row of the main table. The subquery is executed before joining the rows and the result is used to join the rows. With this join mode, you can use information from one table to filter or process data from another table.
A LATERAL join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL join are evaluated once for each row left of it - just like a correlated subquery - while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)
SELECT id,
email,
name
FROM users u
INNER JOIN LATERAL
-- below is the lateral join subquery
( SELECT name
FROM profiles p
WHERE u.id = p.user_id) profiles ON true;
- notice that the join condition that combines each table is described in the subquery's WHERE clause
The rows are joined when u.id (from the outer query’s users table) matches p.user_id (from the subquery’s profiles table). Normally, a subquery can’t reference outer query columns, but this is exactly what the lateral keyword enables.
You can rewrite a lateral join without using the JOIN keyword, which allows you to omit the ON true altogether.
SELECT id,
email,
NAME
FROM users u,
LATERAL
-- below is the lateral join subquery
( SELECT name
FROM profiles p
WHERE u.id = p.user_id) profiles;
This syntax implies that you are using an inner join, so the previous syntax is required if you’d like to use a left join.
Top-N Queries
In cases of one-to-many relationships, lateral joins can produce similar results to window functions, with the advantages of a cleaner syntax and improved performance.
Consider a table relationship where every user can have zero or more blog posts, and you need to write a query that returns only two posts per user.
For this query, you could write a verbose window function or a syntactically brief lateral join.
Here’s the window function:
SELECT id,
title
FROM users u
LEFT JOIN (
SELECT user_id,
title,
ROW_NUMBER() OVER(PARTITION BY user_id) as blog_number
FROM blog_posts ) bp
ON u.id = bp.user_id
WHERE blog_number < 3;
And here’s the left lateral join wbere we can use Limit clause:
SELECT id,
title
FROM users u
LEFT JOIN LATERAL
-- below is the lateral join subquery
(
SELECT title
FROM blog_posts bp
WHERE u.id = bp.user_id
LIMIT 2) blogs ON true;
For each row in users execute subquery after LATERAL keyword and limit the result of each subquery by logic LIMIT 2
. And append the result of each subquery to create the final output
Clause is then written:on true
because we inject the join condition right into the subquery as a where clause.
Limiting the window function results requires first ranking each blog post by user, then later limiting the result set (based on the ranking) from within the WHERE clause of the outer query. By contrast, in the lateral join, you limit the result set directly in the subquery:
WHERE u.id = bp.user_id LIMIT 2
Lateral joins are often described as for loops for SQL tables. That’s because the underlying logic of a lateral join follows this pattern: For each row in the left-hand table, perform the right-hand subquery, which is a correlated subquery that can cross-reference field values in the outer query.
sql lateral join
Znajdz dla pracownika najlepszy z jego benefitow
select e.empno,e.depno,b.bonus
from emp e,
LATERAL
( select bonus
from dept_benefits d
where d.deptno = e.deptno
order by bonus desc
fetch first 1 row only
) b
order by 1,3
select * from company
DEPTNO EMPS
10 CLARK,KING
20 ADAM,FORD,JOHN
select c.deptno
regex_substr(c.emps,'[^,]+', 1,indices.idx) as ename
from
company c,
lateral (
select level idx from dual
connect by leve <=
length(regexp_replace(c.emps,'[^,]+'))+1
) indices
DEPTNO ENAME
10 CLARK
10 KING
20 ADAM
20 FORD
20 JOHN
Example
CREATE TABLE t_product AS
SELECT id AS product_id,
id * 10 * random() AS price,
'product ' || id AS product
FROM generate_series(1, 1000) AS id;
CREATE TABLE t_wishlist
(
wishlist_id int,
username text,
desired_price numeric
);
INSERT INTO t_wishlist VALUES
(1, 'hans', '450'),
(2, 'joe', '60'),
(3, 'jane', '1500')
;
Suppose we wanted to find the top three products for every wish, in pseudo-code:
for x in wishlist
loop
for y in products order by price desc
loop
found++
if found <= 3
then
return row
else
jump to next wish
end
end loop
end loop
done using a LATERAL-join:
SELECT *
FROM t_wishlist AS w,
LATERAL (SELECT *
FROM t_product AS p
WHERE p.price < w.desired_price
ORDER BY p.price DESC LIMIT 3 ) AS x
ORDER BY wishlist_id, price DESC;
The FROM-clause is the “outer loop” in our pseudo code and the LATERAL can be seen as the “inner loop”.
Example
we want to search for all users that have posts with likes greater than 2; a query that solves this problem is:
forumdb=> select u.* from users u where exists (select 1 from posts p
where u.pk=p.author and likes > 2 ) ;
we want the value of the likes field too. A simple way to solve this problem is using the lateral join:
forumdb=> select u.username,q.* from users u join lateral (select author,
title,likes from posts p where u.pk=p.author and likes > 2 ) as q on true;
This query is very similar to the EXISTS query, except the fact that, in the main query, we can have all the values that are in the subquery and we can use them in the main part of the query.
CREATE TABLE events (
id int8 GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_start DATE NOT NULL,
event_end DATE NOT NULL,
CHECK (event_end >= event_start)
);
WITH event_starts AS (
SELECT now() - '2 weeks'::INTERVAL * random() AS START
FROM generate_series(1,5) i
)
INSERT INTO events (event_start, event_end)
SELECT
START,
START + '3 days'::INTERVAL + random() * '4 days'::INTERVAL
FROM
event_starts;
SELECT d::DATE AS DAY
FROM generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d;
DAY
------------
2022-09-01
2022-09-02
2022-09-03
2022-09-04
...
2022-09-30
# Number of events on every day even are 0 so letf Join
SELECT
d::DATE AS DAY,
COUNT(l.id) AS events
FROM
generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d
LEFT JOIN lateral (
SELECT * FROM events e
WHERE d::DATE BETWEEN e.event_start AND e.event_end
) AS l ON (TRUE)
GROUP BY d.date
ORDER BY d.date