postgres sql distinct on - ghdrako/doc_snipets GitHub Wiki
Distinct on
DISTINCT ON
is a PostgreSQL extension of the standard, where only DISTINCT
on the whole SELECT list is defined.
DISTINCT ON
clause can be used to return a distinct subset of rows based on the specified columns. This is
particularly useful when you need to retrieve the first row of each unique group of rows.
DISTINCT ON ( expression [, ...] )
keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
The first point is that whatever you put in the ON (), must come first in the the ORDER By.
Jeśli nie zostanie podana klauzula order by pierwszy wiersz zostanie wybrany losowo. Wiec w wiekszosci przypadkow dinstinct on nie ma sensu bez klauzuli order by.
Wyswietli informację o najmniejszej fakturze dla każdego typu faktury
SELECT DISTINCT ON (typ) *
FROM faktury
ORDER BY typ, kwota;
a jak najwieksza
SELECT DISTINCT ON (typ) *
FROM faktury
ORDER BY typ, kwota DESC;
DISTINCT ON
can be combined with ORDER BY
. Leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON
, but you can rearrange order among those freely.
SELECT DISTINCT ON (a) a, b, c FROM a_table ORDER BY a, b;
-- get last row group by multiple columns
SELECT DISTINCT ON (1,2,3,4)
sp.game_id, sportsbook_id, spread_type, spread_duration, game_update_count
FROM spreads sp
LEFT JOIN schedule sch USING (game_id)
WHERE date >= '2012-01-01'
AND date <= '2012-01-02'
ORDER BY 4,3,2,1, game_update_count DESC;
The results are then filtered, so that for each of the distinct entities, only the first row is actually returned.
CREATE TABLE example (
id INT,
person_id INT,
address_id INT,
effective_date DATE
);
INSERT INTO
example (id, person_id, address_id, effective_date)
VALUES
(1, 2, 1, '2000-01-01'), -- Moved to first house
(5, 2, 2, '2004-08-19'), -- Went to uni
(9, 2, 1, '2007-06-12'), -- Moved back home
(2, 4, 3, '2007-05-18'), -- Moved to first house
(3, 4, 4, '2016-02-09') -- Moved to new house
;
SELECT DISTINCT ON (person_id)
*
FROM
example
ORDER BY
person_id,
effective_date DESC
;
This will order the results so that all the records for each person are contiguous, ordered from the most recent record to the oldest. Then, for each person, on the first record is returned. Thus, giving the most recent address for each person.
Step 1 : Apply the ORDER BY...
id | person_id | address_id | effective_date
----+-----------+------------+----------------
9 | 2 | 1 | '2007-06-12'
5 | 2 | 2 | '2004-08-19'
1 | 2 | 1 | '2000-01-01'
3 | 4 | 4 | '2016-02-09'
2 | 4 | 3 | '2007-05-18'
Step 2 : filter to just the first row per person_id
id | person_id | address_id | effective_date
----+-----------+------------+----------------
9 | 2 | 1 | '2007-06-12'
3 | 4 | 4 | '2016-02-09'
DISTINCT ON is typically simplest and fastest for this in PostgreSQL. (For performance optimization for certain workloads see below.)
SELECT DISTINCT ON (customer)
id, customer, total
FROM purchases
ORDER BY customer, total DESC, id;
Or shorter (if not as clear) with ordinal numbers of output columns:
SELECT DISTINCT ON (2)
id, customer, total
FROM purchases
ORDER BY 2, 3 DESC, 1;
If total can be null, add NULLS LAST:
ORDER BY customer, total DESC NULLS LAST, id;
Works either way, but you'll want to match existing indexes
Examples
db<>fiddle here
Retrieves the highest-paid employee in each department by first ordering the rows within each department by salary in descending order.
SELECT DISTINCT ON (department_id) department_id, employee_id, name, salary
FROM employees
ORDER BY department_id, salary DESC;