postgres sql distinct on - ghdrako/doc_snipets GitHub Wiki

Distinct on

DISTINCT ON is a PostgreSQL extension of the standard, where only DISTINCT on the whole SELECT list is defined.

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

The first point is that whatever you put in the ON (), must come first in the the ORDER By.

Jeśli nie zostanie podana klauzula order by pierwszy wiersz zostanie wybrany losowo.

DISTINCT ON can be combined with ORDER BY. Leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON, but you can rearrange order among those freely.

SELECT DISTINCT ON (a) a, b, c FROM a_table ORDER BY a, b;

--  get last row group by multiple columns
SELECT DISTINCT ON (1,2,3,4)
       sp.game_id, sportsbook_id, spread_type, spread_duration, game_update_count
FROM   spreads sp
LEFT   JOIN schedule sch USING (game_id)
WHERE  date >= '2012-01-01'
AND    date <= '2012-01-02'
ORDER  BY 4,3,2,1, game_update_count DESC;

The results are then filtered, so that for each of the distinct entities, only the first row is actually returned.

CREATE TABLE example (
    id               INT,
    person_id        INT,
    address_id       INT,
    effective_date   DATE
);

INSERT INTO
    example (id, person_id, address_id, effective_date)
VALUES
    (1, 2, 1, '2000-01-01'),  -- Moved to first house
    (5, 2, 2, '2004-08-19'),  -- Went to uni
    (9, 2, 1, '2007-06-12'),  -- Moved back home

    (2, 4, 3, '2007-05-18'),  -- Moved to first house
    (3, 4, 4, '2016-02-09')   -- Moved to new house
;

SELECT DISTINCT ON (person_id)
    *
FROM
    example
ORDER BY
    person_id,
    effective_date DESC
;


This will order the results so that all the records for each person are contiguous, ordered from the most recent record to the oldest. Then, for each person, on the first record is returned. Thus, giving the most recent address for each person.

Step 1 : Apply the ORDER BY...

 id | person_id | address_id | effective_date
----+-----------+------------+----------------
  9 |      2    |      1     |  '2007-06-12'
  5 |      2    |      2     |  '2004-08-19'
  1 |      2    |      1     |  '2000-01-01'
  3 |      4    |      4     |  '2016-02-09'
  2 |      4    |      3     |  '2007-05-18'

Step 2 : filter to just the first row per person_id

 id | person_id | address_id | effective_date
----+-----------+------------+----------------
  9 |      2    |      1     |  '2007-06-12'
  3 |      4    |      4     |  '2016-02-09'

DISTINCT ON is typically simplest and fastest for this in PostgreSQL. (For performance optimization for certain workloads see below.)

SELECT DISTINCT ON (customer)
       id, customer, total
FROM   purchases
ORDER  BY customer, total DESC, id;

Or shorter (if not as clear) with ordinal numbers of output columns:

SELECT DISTINCT ON (2)
       id, customer, total
FROM   purchases
ORDER  BY 2, 3 DESC, 1;

If total can be null, add NULLS LAST:

ORDER  BY customer, total DESC NULLS LAST, id;

Works either way, but you'll want to match existing indexes

db<>fiddle here