sql lateral join - ghdrako/doc_snipets GitHub Wiki

LATERAL is commonly used for querying against an array or JSON data, as well as a replacement for the DISTINCT ON syntax.I would also double check performance when using LATERAL - its generally not as good as other join options.

When joining multiple tables, the rows of both tables are linked together based on some conditions. However, when the result should e.g. be limited to the last three bought products for every customer, the standard join clause will not work: The joined bought products table can not be limited to only include three rows for every customer. That is not how joins do work, all rows matching the join criteria are included.

In some situations, a non-constant join is required to LIMIT the rows of the joined table for every row of the source table instead of limiting the complete result.

Joining a table to a subquery will not help solve the requirement either. Any joined subquery is executed as an independent subquery; it is executed once for the complete join operation. The database is just transforming the subquery to a table-like structure:

  • The subquery is executed first.
  • The results are saved temporarily in memory or on disk (a “virtual table” is created).
  • The join algorithm is executed with the source table and the virtual target table.

The big difference and great feature of LATERAL joins is the change to the execution model: Instead of executing the subqueries once for all rows, it is now executed once for every row the subquery will be joined to. The procedure mimics a for-each loop in SQL, which iterates over the source table and executes the lateral join for every row with the source table's row as input.

With lateral joins, the execution of a subquery in a join changes from an independent subquery to a dependent/derived subquery. The execution once for every row can have some performance implications. However, these performance implications are no reason never to use them. Without lateral joins the application is implementing the for-each loop in programming code: The subquery is sent to the database multiple times and needs to be parsed, planned, and executed repeatedly. By utilizing a lateral join the database can choose a more optimal execution plan and do the parsing and planning step only once. So, it is not like the lateral join is a slow database feature that should be avoided. On the contrary, it is a powerful feature that moves the application's for-each loop to the database, which can execute it much more efficiently.

LATERAL joins allow a subquery to reference columns from a preceding table in the same FROM clause, enabling  more dynamic and powerful query constructs.

select  *
from    table1 t1
cross join lateral
        (
        select  *
        from    t2
        where   t1.col1 = t2.col1 -- Only allowed because of lateral
        ) sub

A lateral join is a type of join in SQL that allows you to join a table with a subquery, where the subquery is run for each row of the main table. The subquery is executed before joining the rows and the result is used to join the rows. With this join mode, you can use information from one table to filter or process data from another table.

A LATERAL join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL join are evaluated once for each row left of it - just like a correlated subquery - while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)

SELECT id,
       email,
       name
FROM   users u
INNER JOIN LATERAL
--      below is the lateral join subquery 
       ( SELECT name
         FROM   profiles p
         WHERE  u.id = p.user_id) profiles ON true;
  • notice that the join condition that combines each table is described in the subquery's WHERE clause

The rows are joined when u.id (from the outer query’s users table) matches p.user_id (from the subquery’s profiles table). Normally, a subquery can’t reference outer query columns, but this is exactly what the lateral keyword enables.

You can rewrite a lateral join without using the JOIN keyword, which allows you to omit the ON true altogether.

SELECT id,
       email,
       NAME
FROM   users u,
       LATERAL
--     below is the lateral join subquery 
       ( SELECT name
         FROM   profiles p
         WHERE  u.id = p.user_id) profiles;

This syntax implies that you are using an inner join, so the previous syntax is required if you’d like to use a left join.

Top-N Queries

In cases of one-to-many relationships, lateral joins can produce similar results to window functions, with the advantages of a cleaner syntax and improved performance.

Consider a table relationship where every user can have zero or more blog posts, and you need to write a query that returns only two posts per user.

For this query, you could write a verbose window function or a syntactically brief lateral join.

Here’s the window function:

SELECT id,
       title
FROM users u
   LEFT JOIN (
      SELECT user_id,
             title,
             ROW_NUMBER() OVER(PARTITION BY user_id) as blog_number
      FROM blog_posts ) bp
      ON u.id = bp.user_id
WHERE blog_number < 3;

And here’s the left lateral join wbere we can use Limit clause:

SELECT id,
       title
FROM   users u
       LEFT JOIN LATERAL
--     below is the lateral join subquery
       (
              SELECT title
              FROM   blog_posts bp
              WHERE  u.id = bp.user_id 
              LIMIT 2) blogs 
   ON true;

For each row in users execute subquery after LATERAL keyword and limit the result of each subquery by logic LIMIT 2. And append the result of each subquery to create the final output

Clause is then written:on true because we inject the join condition right into the subquery as a where clause.

Limiting the window function results requires first ranking each blog post by user, then later limiting the result set (based on the ranking) from within the WHERE clause of the outer query. By contrast, in the lateral join, you limit the result set directly in the subquery:

WHERE  u.id = bp.user_id LIMIT 2

Lateral joins are often described as for loops for SQL tables. That’s because the underlying logic of a lateral join follows this pattern: For each row in the left-hand table, perform the right-hand subquery, which is a correlated subquery that can cross-reference field values in the outer query.

sql lateral join

Znajdz dla pracownika najlepszy z jego benefitow

select e.empno,e.depno,b.bonus
from emp e,
     LATERAL
     ( select bonus
        from dept_benefits d
       where d.deptno = e.deptno
      order by bonus desc
      fetch first 1 row only
    ) b
order by 1,3
select * from company
DEPTNO EMPS
10     CLARK,KING
20     ADAM,FORD,JOHN     



select c.deptno
       regex_substr(c.emps,'[^,]+', 1,indices.idx) as ename
  from
     company c,
     lateral (
        select level idx from dual
        connect by  leve <=
          length(regexp_replace(c.emps,'[^,]+'))+1
    ) indices

DEPTNO ENAME
10     CLARK
10     KING
20     ADAM
20     FORD
20     JOHN

Example

CREATE TABLE t_product AS
    SELECT   id AS product_id,
             id * 10 * random() AS price,
             'product ' || id AS product
    FROM generate_series(1, 1000) AS id;
 
CREATE TABLE t_wishlist
(
    wishlist_id        int,
    username           text,
    desired_price      numeric
);
 
INSERT INTO t_wishlist VALUES
    (1, 'hans', '450'),
    (2, 'joe', '60'),
    (3, 'jane', '1500')
;

Suppose we wanted to find the top three products for every wish, in pseudo-code:

for x in wishlist
loop      
  for y in products order by price desc      
  loop           
    found++           
    if found <= 3           
    then               
      return row           
    else               
     jump to next wish           
    end      
  end loop
end loop

done using a LATERAL-join:

SELECT        *
FROM      t_wishlist AS w,    
LATERAL  (SELECT      *        
          FROM       t_product AS p        
          WHERE       p.price < w.desired_price        
          ORDER BY p.price DESC        LIMIT 3       ) AS x
ORDER BY wishlist_id, price DESC;

The FROM-clause is the “outer loop” in our pseudo code and the LATERAL can be seen as the “inner loop”.

Example

we want to search for all users that have posts with likes greater than 2; a query that solves this problem is:

forumdb=> select u.* from users u where exists (select 1 from posts p
where u.pk=p.author and likes > 2 ) ;

we want the value of the likes field too. A simple way to solve this problem is using the lateral join:

forumdb=> select u.username,q.* from users u join lateral (select author,
title,likes from posts p where u.pk=p.author and likes > 2 ) as q on true;

This query is very similar to the EXISTS query, except the fact that, in the main query, we can have all the values that are in the subquery and we can use them in the main part of the query.

CREATE TABLE events (
    id int8 GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    event_start DATE NOT NULL,
    event_end DATE NOT NULL,
    CHECK (event_end >= event_start)
);

WITH event_starts AS (
    SELECT now() - '2 weeks'::INTERVAL * random() AS START
    FROM generate_series(1,5) i
)
INSERT INTO events (event_start, event_end)
SELECT
    START,
    START + '3 days'::INTERVAL + random() * '4 days'::INTERVAL
FROM
    event_starts;


SELECT d::DATE AS DAY
FROM generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d;
    DAY     
------------
 2022-09-01
 2022-09-02
 2022-09-03
 2022-09-04
...
 2022-09-30


# Number of events on every day even are 0 so letf Join
SELECT
    d::DATE AS DAY,
    COUNT(l.id) AS events
FROM
    generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d
    LEFT JOIN lateral (
        SELECT * FROM events e
        WHERE d::DATE BETWEEN e.event_start AND e.event_end
    ) AS l ON (TRUE)
GROUP BY d.date
ORDER BY d.date

Example

'LATERAL' join is used to find the highest-paid employee in each department

SELECT 
    dept.department_name,
    emp.name,
    emp.salary
FROM 
    departments dept
LEFT JOIN LATERAL (
    SELECT 
        name,
        salary
    FROM 
        employees emp
    WHERE 
        emp.department_id = dept.department_id
    ORDER BY 
        salary DESC
    LIMIT 
        1
) emp ON true;

Example

Bez lateral

explain analyze
select
    to_char(date_trunc('month', o.purchase_at), 'YYYY-MM') as transaction_month,
    count(o.id) as order_count,
    sum(o.purchase_amount) as order_total,
    sum(paid.amount_paid) as amount_paid
from orders o
left outer join (
    select order_id, sum(paid_amount) as amount_paid
    from payments
    group by order_id
) paid on o.id = paid.order_id
where
    o.account_id = 1769 and
    o.course_id = 88872 and
    o.purchase_at is not null
group by 1
order by 1;

z lateral

explain analyze
select
    to_char(date_trunc('month', o.purchase_at), 'YYYY-MM') as transaction_month,
    count(o.id) as order_count,
    sum(o.purchase_amount) as order_total,
    sum(paid.amount_paid) as amount_paid
from orders o
left outer join lateral (
    select order_id, sum(paid_amount) as amount_paid
    from payments
    where o.id = payments.order_id
    group by order_id
) paid on true
where
    o.account_id = 1769 and
    o.course_id = 88872 and
    o.purchase_at is not null
group by 1
order by 1;

bez lateral zapytanie grupujace wykona sie raz dla calej tabeli payments - set based query. Z lateral zapytanie grupujace wykona sie wiele razy dla order_id ktore zwoci pierwsza czesc z tabeli orders - a row-by-row calculation. Jesli malo orders bedzie efektywniejsze z lateral.

Example

SELECT
  accounts.id,
  accounts.name,
  last_purchase.*
FROM
  accounts
  INNER JOIN LATERAL (
    SELECT *
    FROM purchases
    WHERE account_id = accounts.id
    ORDER BY created_at DESC
    LIMIT 1
  ) AS last_purchase ON true;

Tutaj LATERAL pozwala, by podzapytanie:

(SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1)

odnosiło się do accounts.id — czyli dla każdego wiersza z tabeli accounts wykonujemy to podzapytanie, znajdując najnowszy zakup dla tego konta. Bez LATERAL nie byłoby możliwe odwołanie się do accounts.id wewnątrz podzapytania w tej pozycji.

W praktyce, LATERAL jest jak pętla: dla każdego wiersza z tabeli głównej (np. accounts), uruchamiasz podzapytanie, które może używać wartości tego wiersza.

Jednak dla dużych zbiorów danych to podejście może być kosztowne — każde podzapytanie wykonuje się osobno dla każdego wiersza z tabeli zewnętrznej.

Alternatywy / metody zastąpienia LATERAL

Group by
WITH latest_purchase_per_account AS (
  SELECT
    account_id,
    MAX(purchases.created_at) AS created_at
  FROM purchases
  GROUP BY account_id
)
SELECT
  accounts.id,
  purchases.*
FROM latest_purchase_per_account
  JOIN accounts ON latest_purchase_per_account.account_id = accounts.id
  JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at
    AND latest_purchase_per_account.account_id = purchases.account_id;

W wielu przypadkach bardziej wydajne, bo operacje agregacyjne są wykonywane zbiorczo. Może nie oddać, który dokładny wiersz „szczegóły zakupu” jeżeli są konflikty (np. dwa zakupy mają tę samą maksymalną datę). Wymaga, by dało się zapisać to zagregowanie.

Window functions (funkcje okienkowe, np. ROW_NUMBER() lub RANK())

Przypisz wierszom ranking w ramach podziału (np. wg account_id) według created_at DESC, i wybierz tylko te z rzędu 1

SELECT *
FROM (
  SELECT
    accounts.id AS account_id,
    accounts.name,
    purchases.*,
    ROW_NUMBER() OVER (PARTITION BY purchases.account_id ORDER BY purchases.created_at DESC) AS rn
  FROM accounts
  JOIN purchases ON purchases.account_id = accounts.id
) sub
WHERE rn = 1;
Podzapytania skorelowane (bez słowa LATERAL, tam gdzie system to pozwala)**

W niektórych dialektach SQL(oracle,postgres,mysql...) możesz użyć skorelowanego podzapytania w SELECT,WHERE np:

SELECT
  a.id,
  a.name,
  (
    SELECT p.id
    FROM purchases p
    WHERE p.account_id = a.id
    ORDER BY p.created_at DESC
    LIMIT 1
  ) AS latest_purchase_id
FROM accounts a;

Dla zapytan skorelowanych we FROM zawsze wymagane jest LATERAL lub CROSS/OUTER APPLY (SQL Server). Wyjątkiem są funkcje tabelaryczne w Oracle i SQL Server, które mogą przyjmować parametry z zewnętrznej tabeli.

DISTINCT ON — „PostgreSQL way”

W PostgreSQL można często uniknąć LATERAL, gdy chcesz pobrać „najlepszy wiersz per grupa”, używając DISTINCT ON:

SELECT
  a.id,
  a.name,
  p.*
FROM accounts a
JOIN (
  SELECT DISTINCT ON (account_id) *
  FROM purchases
  ORDER BY account_id, created_at DESC
) p ON p.account_id = a.id;
  • DISTINCT ON (account_id) powoduje, że dla każdego account_id pozostaje tylko pierwszy wiersz według ORDER BY account_id, created_at DESC — czyli najnowszy zakup per konto.
  • To podejście (w PostgreSQL) często daje dobrą wydajność i prosty zapis.