Postgres SQL - ghdrako/doc_snipets GitHub Wiki

--Location of specified substring [ position(substring in string) ]
select position('pan' in 'japan') As pos;
3

--Extract substring [ substring(string from [int] for [int] ]
select substring('postgres' from 3 for 3) As sub_string;
stg

--Number of characters in string [ char_length(string) ]
select char_length('Arizona') as num_chars;
7

Insert newline in SQL output
select 'line 1'||E'\n'||'line 2' As newline;
newline 
---------
 line 1 +
 line 2
(1 row)

--Dollar-quoted String constant

select $$Maria’s dogs$$ As col; --If a string contains many single quotes or backslashes then Postgres has an alternative called “dollar quoting”.

Filter - Multiple Aggregates In One Query

SELECT
  COUNT(*) FILTER (WHERE released_at = 2001) AS released_2001,
  COUNT(*) FILTER (WHERE released_at = 2002) AS released_2002,
  COUNT(*) FILTER (WHERE director = 'Steven Spielberg') AS
director_stevenspielberg,
  COUNT(*) FILTER (WHERE director = 'James Cameron') AS
director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';

The full support for aggregations is retained, which means constructs like JSONB_AGG(DISTINCT genre) FILTER(WHERE director = 'Quentin Tarantino') to get all genres Quentin Tarantino was a director for as a JSON array are still possible.

Group by

When you group on a primary key, all columns of the same table can be omitted because the database will add them for you automatically.

SELECT actors.firstname, actors.lastname, COUNT(*) as count
FROM actors
JOIN actors_movies USING(actor_id)
GROUP BY actors.id -- pk

Inequality Checks With Nullable Columns

-- classic
SELECT * FROM example WHERE (col IS NULL OR col != 'value');
-- short syntax
SELECT * FROM example WHERE column IS DISTINCT FROM 'value';

search every colum in table

select * from table where a ~ '...' or b ~ '...' or c ~ '...
select u from pg_user u limit 1; -- Pg has ROW datatype(s) – basically single value which contains many fields 
(akepka,16384,f,t,f,f,********,,)
select u.* from pg_user u where u::text ~ '22';

sort for every column in table

SELECT * FROM t_test ORDER BY t_test

sorting by the “table”. What that means is that we're sorting by every column (so in our case, ORDER BY a, b).

move data from table to other table

with x as (
    DELETE FROM users_old WHERE username < 'b' returning *
)
INSERT INTO users_a
    SELECT * FROM x;

-- move yet-to-start todo items from 2020 to 2021
WITH ah_well AS (
    DELETE FROM todos_2020
          WHERE NOT started
      RETURNING *
)
INSERT INTO todos_2021
            SELECT * FROM ah_well;

CTE as a table

WITH a(x) AS (
  VALUES ('è'),('é')
) SELECT *
FROM a

bool_and

The bool_and aggregate starts true and remains true only if every row it sees evaluates to true.

With that aggregate, it’s then possible to search for all drivers who failed to ??nish any single race they participated in over their whole career:

with counts as 
 ( 
 select driverid, forename, surname, 
        count(*) as races, 
        bool_and(position is null) as never_finished 
   from drivers 
   join results using(driverid) 
   join races using(raceid) 
  group by driverid 
 ) 
 select driverid, forename, surname, races 
   from counts 
  where never_finished 
  order by races desc;

filter

Aggregate filter(where …) syntax that allows us to update our computation only for those rows that pass the filter.

select date_trunc('year', date) as year, 
      count(*) filter(where position is null) as outs, 
      bool_and(position is null) as never_finished 
 from drivers 
 join results using(driverid) 
 join races using(raceid) 
 group by date_trunc('year', date), driverid 
select count(*) as all, 
       count(*) filter(where country_code is null) as no_country, 
       count(*) filter(where admin1_code is null) as no_region, 
       count(*) filter(where admin2_code is null) as no_district, 
       count(*) filter(where feature_class is null) as no_class, 
       count(*) filter(where feature_code is null) as no_feat 
 from raw.geonames ;

grouping set

group by rollup(drivers.surname, constructors.name);

group by cube(drivers.surname, constructors.name);

the grouping set allow to run aggregates over more than one group at a time within a single query scan.

group by grouping sets((year, driverid), 
                        (year, constructorid)) 

bar

select season, 
       round(100.0 * accidents / participants, 2) as pct, 
       repeat(text '■',  ceil(100*accidents/participants)::int  )  as bar 
 from accidents 
 where season between 1974 and 1990

distinct on

select distinct on (driverid) 2 forename, surname 3 from results 4 join drivers using(driverid) 5 where position = 1;

The classic way to have a single result per driver in SQL would be to aggregate over them, creating a group per driver:

1 select forename, surname 2 from results join drivers using(driverid) 3 where position = 1 4 group by drivers.driverid;

except operator

Theexceptoperator is very useful for writing test cases, as it allows us to compute a di?ference in between two result sets. One way to use it is to store the result of running a query against a knownf i xtureor database content in an expected ??le. Then when you change a query, it’s easy to load your expected data set into the database and compare it with the result of running the new query.

is distinct from and not distinct from

Replace = and <> the difference is that null is equal null

Select null is distinct from null. Null=null;
True false

Windows function

window functions only happens after the where clause, so you only get to see rows from the available result set of the query.

You need to remember that the windowing clauses are always considered last in the query, meaning af?er thewhereclause. In any frame you can only see rows that have been selected for output: e.g. it’s not directly possible to compute a percentage of values that you don’t want to display. You would need to use a subquery in that case.

array_agg

The window definition over (order by x) actually means over (order by x rows between unbounded preceding and current row):

-- both query are the same 
select x, array_agg(x) over (order by x) 2 from generate_series(1, 3) as t(x);
select x, 
 array_agg(x) over (order by x 
                      rows between unbounded preceding 
                      and current row) 
 from generate_series(1, 3) as t(x);
x | array_agg 
---+-----------
1 | {1} 
2 | {1,2} 
3 | {1,2,3} 
(3 rows)
select x, 
       array_agg(x) over (rows between current row 
       and unbounded following) 
 from generate_series(1, 3) as t(x);
x | array_agg 
---+-----------
1 | {1,2,3} 
2 | {2,3} 
3 | {3} 
(3 rows) 

Ordered Set Aggregates

  • mode()
  • PERCENTILE_CONT()
  • PERCENTILE_DISC()
SELECT {ordered_set_function} WITHIN GROUP (ORDER BY {order_column})
FROM {table};
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY base_msrp) AS median
from products;

The reason we use 0.5 is because the median is the 50th percentile, which is 0.5 as a fraction.