Postgres SQL - ghdrako/doc_snipets GitHub Wiki
--Location of specified substring [ position(substring in string) ]
select position('pan' in 'japan') As pos;
3
--Extract substring [ substring(string from [int] for [int] ]
select substring('postgres' from 3 for 3) As sub_string;
stg
--Number of characters in string [ char_length(string) ]
select char_length('Arizona') as num_chars;
7
Insert newline in SQL output
select 'line 1'||E'\n'||'line 2' As newline;
newline
---------
line 1 +
line 2
(1 row)
--Dollar-quoted String constant
select $$Maria’s dogs$$ As col; --If a string contains many single quotes or backslashes then Postgres has an alternative called “dollar quoting”.
Filter - Multiple Aggregates In One Query
- https://sqlfordevs.com/multiple-aggregates-in-one-query
- https://www.postgresql.org/docs/current/functions-aggregate.html
SELECT
COUNT(*) FILTER (WHERE released_at = 2001) AS released_2001,
COUNT(*) FILTER (WHERE released_at = 2002) AS released_2002,
COUNT(*) FILTER (WHERE director = 'Steven Spielberg') AS
director_stevenspielberg,
COUNT(*) FILTER (WHERE director = 'James Cameron') AS
director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';
The full support for aggregations is retained, which means constructs like JSONB_AGG(DISTINCT genre) FILTER(WHERE director = 'Quentin Tarantino')
to get all genres Quentin Tarantino was a director for as a JSON array are still possible.
Group by
When you group on a primary key, all columns of the same table can be omitted because the database will add them for you automatically.
SELECT actors.firstname, actors.lastname, COUNT(*) as count
FROM actors
JOIN actors_movies USING(actor_id)
GROUP BY actors.id -- pk
Inequality Checks With Nullable Columns
-- classic
SELECT * FROM example WHERE (col IS NULL OR col != 'value');
-- short syntax
SELECT * FROM example WHERE column IS DISTINCT FROM 'value';
search every colum in table
select * from table where a ~ '...' or b ~ '...' or c ~ '...
select u from pg_user u limit 1; -- Pg has ROW datatype(s) – basically single value which contains many fields
(akepka,16384,f,t,f,f,********,,)
select u.* from pg_user u where u::text ~ '22';
sort for every column in table
SELECT * FROM t_test ORDER BY t_test
sorting by the “table”. What that means is that we're sorting by every column (so in our case, ORDER BY a, b).
move data from table to other table
with x as (
DELETE FROM users_old WHERE username < 'b' returning *
)
INSERT INTO users_a
SELECT * FROM x;
-- move yet-to-start todo items from 2020 to 2021
WITH ah_well AS (
DELETE FROM todos_2020
WHERE NOT started
RETURNING *
)
INSERT INTO todos_2021
SELECT * FROM ah_well;
CTE as a table
WITH a(x) AS (
VALUES ('è'),('é')
) SELECT *
FROM a
bool_and
The bool_and
aggregate starts true and remains true only if every row it sees evaluates to true.
With that aggregate, it’s then possible to search for all drivers who failed to ??nish any single race they participated in over their whole career:
with counts as
(
select driverid, forename, surname,
count(*) as races,
bool_and(position is null) as never_finished
from drivers
join results using(driverid)
join races using(raceid)
group by driverid
)
select driverid, forename, surname, races
from counts
where never_finished
order by races desc;
filter
Aggregate filter(where …)
syntax that allows us to update our computation only for those rows that pass the filter.
select date_trunc('year', date) as year,
count(*) filter(where position is null) as outs,
bool_and(position is null) as never_finished
from drivers
join results using(driverid)
join races using(raceid)
group by date_trunc('year', date), driverid
select count(*) as all,
count(*) filter(where country_code is null) as no_country,
count(*) filter(where admin1_code is null) as no_region,
count(*) filter(where admin2_code is null) as no_district,
count(*) filter(where feature_class is null) as no_class,
count(*) filter(where feature_code is null) as no_feat
from raw.geonames ;
grouping set
group by rollup(drivers.surname, constructors.name);
group by cube(drivers.surname, constructors.name);
the grouping set allow to run aggregates over more than one group at a time within a single query scan.
group by grouping sets((year, driverid),
(year, constructorid))
bar
select season,
round(100.0 * accidents / participants, 2) as pct,
repeat(text '■', ceil(100*accidents/participants)::int ) as bar
from accidents
where season between 1974 and 1990
distinct on
select distinct on (driverid) 2 forename, surname 3 from results 4 join drivers using(driverid) 5 where position = 1;
The classic way to have a single result per driver in SQL would be to aggregate over them, creating a group per driver:
1 select forename, surname 2 from results join drivers using(driverid) 3 where position = 1 4 group by drivers.driverid;
except operator
Theexceptoperator is very useful for writing test cases, as it allows us to compute a di?ference in between two result sets. One way to use it is to store the result of running a query against a knownf i xtureor database content in an expected ??le. Then when you change a query, it’s easy to load your expected data set into the database and compare it with the result of running the new query.
is distinct from and not distinct from
Replace = and <> the difference is that null is equal null
Select null is distinct from null. Null=null;
True false
Windows function
window functions only happens after the where clause, so you only get to see rows from the available result set of the query.
You need to remember that the windowing clauses are always considered last in the query, meaning af?er thewhereclause. In any frame you can only see rows that have been selected for output: e.g. it’s not directly possible to compute a percentage of values that you don’t want to display. You would need to use a subquery in that case.
array_agg
The window definition over (order by x)
actually means over (order by x rows between unbounded preceding and current row)
:
-- both query are the same
select x, array_agg(x) over (order by x) 2 from generate_series(1, 3) as t(x);
select x,
array_agg(x) over (order by x
rows between unbounded preceding
and current row)
from generate_series(1, 3) as t(x);
x | array_agg
---+-----------
1 | {1}
2 | {1,2}
3 | {1,2,3}
(3 rows)
select x,
array_agg(x) over (rows between current row
and unbounded following)
from generate_series(1, 3) as t(x);
x | array_agg
---+-----------
1 | {1,2,3}
2 | {2,3}
3 | {3}
(3 rows)
Ordered Set Aggregates
- mode()
- PERCENTILE_CONT()
- PERCENTILE_DISC()
SELECT {ordered_set_function} WITHIN GROUP (ORDER BY {order_column})
FROM {table};
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY base_msrp) AS median
from products;
The reason we use 0.5 is because the median is the 50th percentile, which is 0.5 as a fraction.