postgres recursive cte - ghdrako/doc_snipets GitHub Wiki
Recursive CTE
A recursive CTE is a special construct that allows an auxiliary statement to reference itself and, therefore, join itself onto previously computed results. This is particularly useful when we need to join a table an unknown number of times, typically to “explode” a fl at tree structure. The traditional solution would involve some kind of iteration, probably by means of a cursor that iterates one tuple at a time over the whole resultset. However, with recursive CTEs, we can use a much cleaner and simpler approach. A recursive CTE is made by an auxiliary statement that is built on top of the following:
- A non-recursive statement, which works as a bootstrap statement and is executed when the auxiliary term is fi rst evaluated
- A recursive statement, which can either reference the bootstrap statement or itself These
select * from tags order by pk;
pk | tag | parent
----+-------------------+--------
1 | Database |
2 | Operating Systems |
3 | PostgreSQL | 1
WITH RECURSIVE tags_tree AS (
-- non recursive statement
SELECT tag, pk, 1 AS level FROM tags WHERE parent IS NULL UNION
-- recursive statement
SELECT tt.tag|| ' -> ' || ct.tag, ct.pk , tt.level + 1 FROM tags ct JOIN tags_tree tt ON tt.pk = ct.parent ) SELECT level,tag FROM tags_tree order by level;
level | tag
-------+------------------------
1 | Database
1 | Operating Systems
2 | Database -> PostgreSQL
Recursive joins, often used in hierarchical or tree-structured data, are enabled by recur- sive CTEs. These techniques allow traversal of data with parent-child relationships.
WITH RECURSIVE subordinate_hierarchy AS (
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION
SELECT e.employee_id, e.name, e.manager_id, sh.level + 1
FROM employees e
INNER JOIN subordinate_hierarchy sh
ON e.manager_id = sh.employee_id
)
SELECT employee_id, name, manager_id, level
FROM subordinate_hierarchy;
The recursive CTE subordinate_hierarchy starts with top-level employees (where manager_id is NULL). It recur- sively joins the employees table with itself to traverse the hierarchy, assigning a level to each employee according to their depth in the hierarchy.
Example
WITH RECURSIVE x(n) AS (
SELECT 1 AS n, 'a'::text AS dummy
UNION ALL
SELECT n + 1, dummy || 'a'
FROM x
WHERE n < 5
)
SELECT *
FROM x;
n | dummy
---+------
1 | a
2 | aa
3 | aaa
4 | aaaa
5 | aaaaa
(5 rows)
The goal of this query is to recursively return numbers and compile a string at the end. Basically, the query consists of two parts: the WITH RECURSIVE part and the SELECT statement at the end starting the recursion. While the SELECT part at the end is trivial, the WITH RECURSIVE part requires a deeper inspection. If we look closely, the WITH statement contains UNION ALL. This is really important: the SELECT statement before UNION ALL represents the start condition of the recursion. In our case, we start with 1 and a. Two columns are produced by the first statement. Then comes the second SQL statement. The important thing here is the FROM clause. It recursively calls x. Each iteration will increment the number by one and add a character to the end of the string. We abort when n reaches 5. Note that the last iteration will already display n + 1 so the last value returned is 5 and not 4. All basic components of recursions are therefore to be found in the query: an init condition, a recursive call, and a condition to terminate.
UNION versus UNION ALL
In any recursion, loops can happen. The problem is that if the loop is infinite, your query will not terminate and will run forever. This is not desirable. UNION prevents such loops by preventing repeated calls using the same parameters.
This difference is really important because it can protect us from bugs in the data by just skipping over instead of entering an infinite loop.
WITH RECURSIVE x(n) AS (
SELECT 1 AS n
UNION ALL
SELECT n
FROM x
WHERE n < 5
)
SELECT *
FROM x;
^Cancel request sent
ERROR: canceling statement due to user request
runs forever, we have to quit
WITH RECURSIVE x(n) AS (
SELECT 1 AS n
UNION
SELECT n
FROM x
WHERE n < 5
)
SELECT *
FROM x;
-
1
(1 row)
The first query never returns because we did not increment n, which leads to an identical recursive call. The second query exits quickly and returns just one row because PostgreSQL figures that it has seen those values before and can therefore terminate the recursion
Example
The standard example is the organization of a company. All employees have a boss, and if we want to find out who is whose boss, recursion is a natural thing to use.
CREATE TABLE t_manager
(
id serial,
person text,
manager text,
UNIQUE (person, manager)
);
Often, hierarchical data is represented as a “slave/master” relationship. We simply store pairs of who is on top and who is below. The nodes on top of our tree have no “master” and are therefore set to NULL. Here is some sample data:
test=# INSERT INTO t_manager (person, manager)
VALUES ('eliza', NULL),
('ronald', 'eliza'),
('carlos', 'eliza'),
('manuel', 'ronald'),
('mike', 'ronald'),
('joe', 'carlos'),
('augustin', 'carlos'),
('jane', 'carlos')
;
The goal of the next query is to create a tree showing us exactly who is in which position:
WITH RECURSIVE x AS (
SELECT person, manager, person AS hierarchy
FROM t_manager
WHERE manager IS NULL
UNION ALL
SELECT t_manager.person, t_manager.manager,
hierarchy || ' --> ' || t_manager.person
FROM t_manager, x
WHERE t_manager.manager = x.person
)
SELECT * FROM x;
person | manager | hierarchy
----------+---------+------------------------------
eliza | | eliza
ronald | eliza | eliza --> ronald
carlos | eliza | eliza --> carlos
manuel | ronald | eliza --> ronald --> manuel
mike | ronald | eliza --> ronald --> mike
joe | carlos | eliza --> carlos --> joe
augustin | carlos | eliza --> carlos --> augustin
jane | carlos | eliza --> carlos --> jane
(8 rows)