postgres with recurtion graph - ghdrako/doc_snipets GitHub Wiki

simple recursion

Recursively return numbers and a string concatenation

WITH RECURSIVE x(n) AS (
    SELECT  1 AS n, 'a'::text AS dummy
    UNION ALL
    SELECT  n + 1, dummy || 'a'
    FROM    x
    WHERE   n < 5
)
SELECT
FROM    x;
 n | dummy
---+-------
 1 | a
 2 | aa
 3 | aaa
 4 | aaaa
 5 | aaaaa
(5 rows)

The SELECT statement before UNION ALL represents the start/init condition of the recursion. The part after contains recursive call, and a condition to terminate.

UNION versus UNION ALL

In any recursion, loops can happen. The problem is that if the loop is infinite, your query will not terminate and will run forever.

UNION prevents such loops by preventing repeated calls using the same parameters. This difference is really important because it can protect us from bugs in the data by just skipping over instead of entering an infinite loop.

test=# WITH RECURSIVE x(n) AS (
    SELECT  1 AS n
    UNION ALL
    SELECT  n
    FROM    x
    WHERE   n < 5
)
SELECT
FROM    x;
^Cancel request sent
ERROR:  canceling statement due to user request
    ß runs forever, we have to quit


test=# WITH RECURSIVE x(n) AS (
    SELECT  1 AS n
    UNION
    SELECT  n
    FROM    x
    WHERE   n < 5
)
SELECT
FROM    x;
 --
 1
(1 row)

The first query never returns because we did not increment n, which leads to an identical recursive call. The second query exits quickly and returns just one row because PostgreSQL figures that it has seen those values before and can therefore terminate the recursion.

Postgres Graph Database

CREATE TABLE nodes (
  id SERIAL PRIMARY KEY,
  data VARCHAR(255)
);
CREATE TABLE edges (
  previous_node INTEGER REFERENCES nodes(id),
  next_node INTEGER REFERENCES nodes(id),
  PRIMARY KEY (previous_node, next_node)
);

INSERT INTO nodes (data) VALUES ('Bob');
INSERT INTO nodes (data) VALUES ('Hank');
INSERT INTO nodes (data) VALUES ('Jeff');

INSERT INTO edges (previous_node, next_node) VALUES (1, 2);
INSERT INTO edges (previous_node, next_node) VALUES (1, 3);

INSERT INTO nodes (data) VALUES ('Sally');
INSERT INTO nodes (data) VALUES ('Sue');
INSERT INTO nodes (data) VALUES ('Sam');

INSERT INTO edges (previous_node, next_node) VALUES (2, 4);
INSERT INTO edges (previous_node, next_node) VALUES (3, 4);
INSERT INTO edges (previous_node, next_node) VALUES (4, 5);

Postgres has a built-in feature that allows us to query graph data without having to know exactly how many joins we need: [recursive queries](https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-RECURSIVE). Recursive queries allow us to traverse the graph starting from a specific node and following its edges until some determined endpoint.

Writing a recursive query to find all Bob’s friends and their friends we would write the following SQL:

WITH RECURSIVE friend_of_friend AS (
  SELECT edges.next_node
  FROM edges
  WHERE edges.previous_node = 1
  UNION
  SELECT edges.next_node
  FROM edges
  JOIN friend_of_friend ON edges.previous_node = friend_of_friend.next_node
)
SELECT nodes.data
FROM nodes
JOIN friend_of_friend ON nodes.id = friend_of_friend.next_node;

In our example we want to start our query with Bob’s friends, so we find the edges where Bob (id: 1) is the previous_node. Then in the recursive case we continually join the edges table to itself until we reach the end of Bob’s graph (eg. when we reach friend_of_friend.next_node = NULL). Finally outside our recursive query we bring it all together. We need to query the nodes that are associated with the edges from the recursive query so we can get each of Bob’s friends’ names.

Example PostgreSQL (employee and manager)

SHOW search_path;
SET search_path = cte_rec;
SHOW search_path;

CREATE TABLE Employees
(
	EmployeeID smallint NOT NULL,
	FirstName varchar(30)  NOT NULL,
	LastName  varchar(40) NOT NULL,
	Title varchar(50) NOT NULL,
	DeptID smallint NOT NULL,
	ManagerID int NULL,
    CONSTRAINT PK_EmployeeID PRIMARY KEY (EmployeeID)
);

INSERT INTO Employees VALUES 
 (1, 'Ken', 'Sánchez', 'Chief Executive Officer',16,NULL)
,(273, 'Brian', 'Welcker', 'Vice President of Sales',3,1)
,(274, 'Stephen', 'Jiang', 'North American Sales Manager',3,273)
,(275, 'Michael', 'Blythe', 'Sales Representative',3,274)
,(276, 'Linda', 'Mitchell', 'Sales Representative',3,274)
,(285, 'Syed', 'Abbas', 'Pacific Sales Manager',3,273)
,(286, 'Lynn', 'Tsoflias', 'Sales Representative',3,285)
,(16,  'David','Bradley', 'Marketing Manager', 4, 273)
,(23,  'Mary', 'Gibson', 'Marketing Specialist', 4, 16);

select * from 

WITH RECURSIVE DirectReports (ManagerID, EmployeeID, Title,  Level)
AS
(
-- Anchor member definition
    SELECT e.ManagerID, e.EmployeeID, e.Title,0 AS Level
    FROM Employees AS e
    WHERE ManagerID IS NULL
    UNION ALL
-- Recursive member definition
    SELECT e.ManagerID, e.EmployeeID, e.Title,Level + 1
    FROM Employees AS e
    INNER JOIN DirectReports AS d
        ON e.ManagerID = d.EmployeeID
)
-- Statement that executes the CTE
SELECT ManagerID, EmployeeID, Title, Level
FROM DirectReports;
SET search_path ="$user", public

Explanation:

  1. The recursive CTE, DirectReports, defines one anchor member and one recursive member. The anchor member returns the base result set T0. This is the highest ranking employee in the company; that is, an employee who does not report to a manager. Here is the result set returned by the anchor member:
ManagerID EmployeeID Title                         Level
--------- ---------- ----------------------------- ------
NULL      1          Chief Executive Officer        0
  1. The recursive member returns the direct subordinate(s) of the employee in the anchor member result set. This is achieved by a join operation between the Employee table and the DirectReports CTE. It is this reference to the CTE itself that establishes the recursive invocation. Based on the employee in the CTE DirectReports as input (Ti), the join (MyEmployees.ManagerID = DirectReports.EmployeeID) returns as output (Ti+1), the employees who have (Ti) as their manager. Therefore, the first iteration of the recursive member returns this result set:
ManagerID EmployeeID Title                         Level
--------- ---------- ----------------------------- ------
1         273        Vice President of Sales       1
  1. The recursive member is activated repeatedly. The second iteration of the recursive member uses the single-row result set in step 3 (containing EmployeeID 273) as the input value, and returns this result set:
ManagerID EmployeeID Title                         Level
--------- ---------- ----------------------------- ------
273       16         Marketing Manager             2
273       274        North American Sales Manager  2
273       285        Pacific Sales Manager         2
  1. The third iteration of the recursive member uses the result set above as the input value, and returns this result set:
ManagerID EmployeeID Title                         Level
--------- ---------- ----------------------------- ------
16        23         Marketing Specialist          3
274       275        Sales Representative          3
274       276        Sales Representative          3
285       286        Sales Representative          3

The final result set returned by the running query is the union of all result sets generated by the anchor and recursive members.

Example 2 employee and manager

CREATE TABLE t_manager
(
     id      serial,
     person  text,
     manager text,
     UNIQUE (person, manager)
);
INSERT INTO t_manager (person, manager)
VALUES ('eliza', NULL),
    ('ronald', 'eliza'),
    ('carlos', 'eliza'),
    ('manuel', 'ronald'),
    ('mike', 'ronald'),
    ('joe', 'carlos'),
    ('augustin', 'carlos'),
    ('jane', 'carlos')
;
WITH RECURSIVE x AS (
    SELECT  person, manager, person AS hierarchy
    FROM    t_manager
    WHERE   manager IS NULL
    UNION ALL
    SELECT  t_manager.person, t_manager.manager,
    hierarchy || ' --> ' || t_manager.person
    FROM    t_manager, x
    WHERE   t_manager.manager = x.person
)
SELECT * FROM x;
  person  | manager |           hierarchy
----------+---------+-------------------------------
 eliza    |         | eliza
 ronald   | eliza   | eliza --> ronald
 carlos   | eliza   | eliza --> carlos
 manuel   | ronald  | eliza --> ronald --> Emanuel 
 mike     | ronald  | eliza --> ronald --> mike
 joe      | carlos  | eliza --> carlos --> joe
 augustin | carlos  | eliza --> carlos --> augustin
 jane     | carlos  | eliza --> carlos --> jane
(8 rows)