postgres with recurtion graph - ghdrako/doc_snipets GitHub Wiki
simple recursion
Recursively return numbers and a string concatenation
WITH RECURSIVE x(n) AS (
SELECT 1 AS n, 'a'::text AS dummy
UNION ALL
SELECT n + 1, dummy || 'a'
FROM x
WHERE n < 5
)
SELECT
FROM x;
n | dummy
---+-------
1 | a
2 | aa
3 | aaa
4 | aaaa
5 | aaaaa
(5 rows)
The SELECT statement before UNION ALL represents the start/init condition of the recursion. The part after contains recursive call, and a condition to terminate.
UNION versus UNION ALL
In any recursion, loops can happen. The problem is that if the loop is infinite, your query will not terminate and will run forever.
UNION prevents such loops by preventing repeated calls using the same parameters. This difference is really important because it can protect us from bugs in the data by just skipping over instead of entering an infinite loop.
test=# WITH RECURSIVE x(n) AS (
SELECT 1 AS n
UNION ALL
SELECT n
FROM x
WHERE n < 5
)
SELECT
FROM x;
^Cancel request sent
ERROR: canceling statement due to user request
ß runs forever, we have to quit
test=# WITH RECURSIVE x(n) AS (
SELECT 1 AS n
UNION
SELECT n
FROM x
WHERE n < 5
)
SELECT
FROM x;
--
1
(1 row)
The first query never returns because we did not increment n, which leads to an identical recursive call. The second query exits quickly and returns just one row because PostgreSQL figures that it has seen those values before and can therefore terminate the recursion.
Postgres Graph Database
- https://www.dylanpaulus.com/posts/postgres-is-a-graph-database/ To store a graph in Postgres we only need to create two tables: nodes and edges. The nodes table will store information about each entity, while the edges table will store information about the relationships between entities.
CREATE TABLE nodes (
id SERIAL PRIMARY KEY,
data VARCHAR(255)
);
CREATE TABLE edges (
previous_node INTEGER REFERENCES nodes(id),
next_node INTEGER REFERENCES nodes(id),
PRIMARY KEY (previous_node, next_node)
);
INSERT INTO nodes (data) VALUES ('Bob');
INSERT INTO nodes (data) VALUES ('Hank');
INSERT INTO nodes (data) VALUES ('Jeff');
INSERT INTO edges (previous_node, next_node) VALUES (1, 2);
INSERT INTO edges (previous_node, next_node) VALUES (1, 3);
INSERT INTO nodes (data) VALUES ('Sally');
INSERT INTO nodes (data) VALUES ('Sue');
INSERT INTO nodes (data) VALUES ('Sam');
INSERT INTO edges (previous_node, next_node) VALUES (2, 4);
INSERT INTO edges (previous_node, next_node) VALUES (3, 4);
INSERT INTO edges (previous_node, next_node) VALUES (4, 5);
Postgres has a built-in feature that allows us to query graph data without having to know exactly how many joins we need: [recursive queries](https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-RECURSIVE). Recursive queries allow us to traverse the graph starting from a specific node and following its edges until some determined endpoint.
Writing a recursive query to find all Bob’s friends and their friends we would write the following SQL:
WITH RECURSIVE friend_of_friend AS (
SELECT edges.next_node
FROM edges
WHERE edges.previous_node = 1
UNION
SELECT edges.next_node
FROM edges
JOIN friend_of_friend ON edges.previous_node = friend_of_friend.next_node
)
SELECT nodes.data
FROM nodes
JOIN friend_of_friend ON nodes.id = friend_of_friend.next_node;
In our example we want to start our query with Bob’s friends, so we find the edges where Bob (id: 1) is the previous_node. Then in the recursive case we continually join the edges table to itself until we reach the end of Bob’s graph (eg. when we reach friend_of_friend.next_node = NULL). Finally outside our recursive query we bring it all together. We need to query the nodes that are associated with the edges from the recursive query so we can get each of Bob’s friends’ names.
Example PostgreSQL (employee and manager)
SHOW search_path;
SET search_path = cte_rec;
SHOW search_path;
CREATE TABLE Employees
(
EmployeeID smallint NOT NULL,
FirstName varchar(30) NOT NULL,
LastName varchar(40) NOT NULL,
Title varchar(50) NOT NULL,
DeptID smallint NOT NULL,
ManagerID int NULL,
CONSTRAINT PK_EmployeeID PRIMARY KEY (EmployeeID)
);
INSERT INTO Employees VALUES
(1, 'Ken', 'Sánchez', 'Chief Executive Officer',16,NULL)
,(273, 'Brian', 'Welcker', 'Vice President of Sales',3,1)
,(274, 'Stephen', 'Jiang', 'North American Sales Manager',3,273)
,(275, 'Michael', 'Blythe', 'Sales Representative',3,274)
,(276, 'Linda', 'Mitchell', 'Sales Representative',3,274)
,(285, 'Syed', 'Abbas', 'Pacific Sales Manager',3,273)
,(286, 'Lynn', 'Tsoflias', 'Sales Representative',3,285)
,(16, 'David','Bradley', 'Marketing Manager', 4, 273)
,(23, 'Mary', 'Gibson', 'Marketing Specialist', 4, 16);
select * from
WITH RECURSIVE DirectReports (ManagerID, EmployeeID, Title, Level)
AS
(
-- Anchor member definition
SELECT e.ManagerID, e.EmployeeID, e.Title,0 AS Level
FROM Employees AS e
WHERE ManagerID IS NULL
UNION ALL
-- Recursive member definition
SELECT e.ManagerID, e.EmployeeID, e.Title,Level + 1
FROM Employees AS e
INNER JOIN DirectReports AS d
ON e.ManagerID = d.EmployeeID
)
-- Statement that executes the CTE
SELECT ManagerID, EmployeeID, Title, Level
FROM DirectReports;
SET search_path ="$user", public
Explanation:
- The recursive CTE, DirectReports, defines one anchor member and one recursive member. The anchor member returns the base result set T0. This is the highest ranking employee in the company; that is, an employee who does not report to a manager. Here is the result set returned by the anchor member:
ManagerID EmployeeID Title Level
--------- ---------- ----------------------------- ------
NULL 1 Chief Executive Officer 0
- The recursive member returns the direct subordinate(s) of the employee in the anchor member result set. This is achieved by a join operation between the Employee table and the DirectReports CTE. It is this reference to the CTE itself that establishes the recursive invocation. Based on the employee in the CTE DirectReports as input (Ti), the join (MyEmployees.ManagerID = DirectReports.EmployeeID) returns as output (Ti+1), the employees who have (Ti) as their manager. Therefore, the first iteration of the recursive member returns this result set:
ManagerID EmployeeID Title Level
--------- ---------- ----------------------------- ------
1 273 Vice President of Sales 1
- The recursive member is activated repeatedly. The second iteration of the recursive member uses the single-row result set in step 3 (containing EmployeeID 273) as the input value, and returns this result set:
ManagerID EmployeeID Title Level
--------- ---------- ----------------------------- ------
273 16 Marketing Manager 2
273 274 North American Sales Manager 2
273 285 Pacific Sales Manager 2
- The third iteration of the recursive member uses the result set above as the input value, and returns this result set:
ManagerID EmployeeID Title Level
--------- ---------- ----------------------------- ------
16 23 Marketing Specialist 3
274 275 Sales Representative 3
274 276 Sales Representative 3
285 286 Sales Representative 3
The final result set returned by the running query is the union of all result sets generated by the anchor and recursive members.
Example 2 employee and manager
CREATE TABLE t_manager
(
id serial,
person text,
manager text,
UNIQUE (person, manager)
);
INSERT INTO t_manager (person, manager)
VALUES ('eliza', NULL),
('ronald', 'eliza'),
('carlos', 'eliza'),
('manuel', 'ronald'),
('mike', 'ronald'),
('joe', 'carlos'),
('augustin', 'carlos'),
('jane', 'carlos')
;
WITH RECURSIVE x AS (
SELECT person, manager, person AS hierarchy
FROM t_manager
WHERE manager IS NULL
UNION ALL
SELECT t_manager.person, t_manager.manager,
hierarchy || ' --> ' || t_manager.person
FROM t_manager, x
WHERE t_manager.manager = x.person
)
SELECT * FROM x;
person | manager | hierarchy
----------+---------+-------------------------------
eliza | | eliza
ronald | eliza | eliza --> ronald
carlos | eliza | eliza --> carlos
manuel | ronald | eliza --> ronald --> Emanuel
mike | ronald | eliza --> ronald --> mike
joe | carlos | eliza --> carlos --> joe
augustin | carlos | eliza --> carlos --> augustin
jane | carlos | eliza --> carlos --> jane
(8 rows)