postgres recursive cte - ghdrako/doc_snipets GitHub Wiki

Recursive CTE

A recursive CTE is a special construct that allows an auxiliary statement to reference itself and, therefore, join itself onto previously computed results. This is particularly useful when we need to join a table an unknown number of times, typically to “explode” a fl at tree structure. The traditional solution would involve some kind of iteration, probably by means of a cursor that iterates one tuple at a time over the whole resultset. However, with recursive CTEs, we can use a much cleaner and simpler approach. A recursive CTE is made by an auxiliary statement that is built on top of the following:

A non-recursive statement, which works as a bootstrap statement and is executed when the auxiliary term is fi rst evaluated
A recursive statement, which can either reference the bootstrap statement or itself These

select * from tags order by pk;
 pk | tag | parent 
----+-------------------+-------- 
 1 | Database |  
 2 | Operating Systems |  
 3 | PostgreSQL | 1

WITH RECURSIVE tags_tree AS (
 -- non recursive statement 
 SELECT tag, pk, 1 AS level FROM tags WHERE parent IS NULL UNION 
-- recursive statement 
SELECT tt.tag|| ' -> ' || ct.tag, ct.pk , tt.level + 1 FROM tags ct JOIN tags_tree tt ON tt.pk = ct.parent ) SELECT level,tag FROM tags_tree order by level;
 level | tag
-------+------------------------ 
1 | Database  
1 | Operating Systems  
2 | Database -> PostgreSQL

Recursive joins, often used in hierarchical or tree-structured data, are enabled by recur- sive CTEs. These techniques allow traversal of data with parent-child relationships.

WITH RECURSIVE subordinate_hierarchy AS (
    SELECT employee_id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    UNION
    SELECT e.employee_id, e.name, e.manager_id, sh.level + 1
    FROM employees e
    INNER JOIN subordinate_hierarchy sh
    ON e.manager_id = sh.employee_id
)
SELECT employee_id, name, manager_id, level
FROM subordinate_hierarchy;

The recursive CTE subordinate_hierarchy starts with top-level employees (where manager_id is NULL). It recur- sively joins the employees table with itself to traverse the hierarchy, assigning a level to each employee according to their depth in the hierarchy.

Example

WITH RECURSIVE x(n) AS (
  SELECT  1 AS n, 'a'::text AS dummy
  UNION ALL
  SELECT  n + 1, dummy || 'a'
  FROM x
  WHERE   n < 5
)
SELECT  *
FROM x;
 n | dummy
---+------
 1 | a
 2 | aa
 3 | aaa
 4 | aaaa
 5 | aaaaa
 (5 rows)

The goal of this query is to recursively return numbers and compile a string at the end. Basically, the query consists of two parts: the WITH RECURSIVE part and the SELECT statement at the end starting the recursion. While the SELECT part at the end is trivial, the WITH RECURSIVE part requires a deeper inspection. If we look closely, the WITH statement contains UNION ALL. This is really important: the SELECT statement before UNION ALL represents the start condition of the recursion. In our case, we start with 1 and a. Two columns are produced by the first statement. Then comes the second SQL statement. The important thing here is the FROM clause. It recursively calls x. Each iteration will increment the number by one and add a character to the end of the string. We abort when n reaches 5. Note that the last iteration will already display n + 1 so the last value returned is 5 and not 4. All basic components of recursions are therefore to be found in the query: an init condition, a recursive call, and a condition to terminate.

UNION versus UNION ALL

In any recursion, loops can happen. The problem is that if the loop is infinite, your query will not terminate and will run forever. This is not desirable. UNION prevents such loops by preventing repeated calls using the same parameters.

This difference is really important because it can protect us from bugs in the data by just skipping over instead of entering an infinite loop.

WITH RECURSIVE x(n) AS (
    SELECT  1 AS n
    UNION ALL
    SELECT  n
    FROM    x
    WHERE   n < 5
 )
 SELECT  *
 FROM    x;
 ^Cancel request sent
 ERROR:  canceling statement due to user request
         runs forever, we have to quit

WITH RECURSIVE x(n) AS (
    SELECT  1 AS n
    UNION
    SELECT  n
    FROM    x
    WHERE   n < 5
 )
 SELECT  *
 FROM    x;
 -
 1
 (1 row)

The first query never returns because we did not increment n, which leads to an identical recursive call. The second query exits quickly and returns just one row because PostgreSQL figures that it has seen those values before and can therefore terminate the recursion

Example

The standard example is the organization of a company. All employees have a boss, and if we want to find out who is whose boss, recursion is a natural thing to use.

 CREATE TABLE t_manager
 (
     id      serial,
     person  text,
     manager text,
     UNIQUE (person, manager)
 );

Often, hierarchical data is represented as a “slave/master” relationship. We simply store pairs of who is on top and who is below. The nodes on top of our tree have no “master” and are therefore set to NULL. Here is some sample data:

 test=# INSERT INTO t_manager (person, manager)
 VALUES ('eliza', NULL),
    ('ronald', 'eliza'),
    ('carlos', 'eliza'),
    ('manuel', 'ronald'),
    ('mike', 'ronald'),
    ('joe', 'carlos'),
    ('augustin', 'carlos'),
    ('jane', 'carlos')
 ;

The goal of the next query is to create a tree showing us exactly who is in which position:

WITH RECURSIVE x AS (
    SELECT  person, manager, person AS hierarchy
    FROM    t_manager
    WHERE   manager IS NULL
    UNION ALL
    SELECT  t_manager.person, t_manager.manager,
            hierarchy || ' --> ' || t_manager.person
    FROM    t_manager, x
    WHERE   t_manager.manager = x.person
 )
 SELECT * FROM x;
  person  | manager |           hierarchy
----------+---------+------------------------------
 eliza    |         | eliza
 ronald   | eliza   | eliza --> ronald
 carlos   | eliza   | eliza --> carlos
 manuel   | ronald  | eliza --> ronald --> manuel
 mike     | ronald  | eliza --> ronald --> mike
 joe      | carlos  | eliza --> carlos --> joe
 augustin | carlos  | eliza --> carlos --> augustin
 jane     | carlos  | eliza --> carlos --> jane
 (8 rows)