PostgreSQL - ttulka/technologies GitHub Wiki
PostgreSQL is a relational database management system (RDBMS).
- Basic Commands
- Data Types
- Queries
- Privileges
- Schemas
- Table Partitioning
- Full Text Search
- Concurrency Control
- Performance
- References
Installation path usually: /usr/local/pgsql
# create database
createdb mydb
# drop database
dropdb mydb
# access database
psql mydb
\h # help
\q # quit
\d <table> # describe table
\dt+ # list tables
\dn+ # list schemas
\c # info about connection
-- show version
SELECT version();
-- show current user
SELECT session_user, current_user;
-- random between 0 and 1
SELECT random();
Name | Storage Size | Description | Range |
---|---|---|---|
smallint | 2 bytes | small-range integer | -32768 to +32767 |
integer | 4 bytes | typical choice for integer | -2147483648 to +2147483647 |
bigint | 8 bytes | large-range integer | -9223372036854775808 to +9223372036854775807 |
decimal | variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
numeric | variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
real | 4 bytes | variable-precision, inexact | 6 decimal digits precision |
double precision | 8 bytes | variable-precision, inexact | 15 decimal digits precision |
smallserial | 2 bytes | small autoincrementing integer | 1 to 32767 |
serial | 4 bytes | autoincrementing integer | 1 to 2147483647 |
bigserial | 8 bytes | large autoincrementing integer | 1 to 9223372036854775807 |
The data types smallserial
, serial
and bigserial
are not true types, but merely a notational convenience for creating unique identifier columns:
CREATE TABLE tablename (
colname SERIAL
);
-- is equivalent to specifying:
CREATE SEQUENCE tablename_colname_seq AS integer;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
The money
type stores a currency amount with a fixed fractional precision.
There is no performance difference among these three types in PostgreSQL.
Name | Description |
---|---|
character varying(n), varchar(n) | variable-length with limit |
character(n), char(n) | fixed-length, blank padded |
text | variable unlimited length |
The bytea
data type allows storage of binary strings.
SELECT '\xDEADBEEF';
Decimal Octet Value | Description | Escaped Input | Example | Hex |
---|---|---|---|---|
0 | zero octet | '\000' | '\000'::bytea | \x00 |
39 | single quote | '''' or '\047' | ''''::bytea | \x27 |
92 | backslash | '\' or '\134' | '\'::bytea | \x5c |
0 to 31 and 127 to 255 | “non-printable” octets | '\xxx' (octal value) | '\001'::bytea | \x01 |
Enumerated (enum) types are data types that comprise a static, ordered set of values.
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
SELECT * FROM person WHERE current_mood = 'happy';
Enum labels are case sensitive, so 'happy'
is not the same as 'HAPPY'
. White space in the labels is significant too.
Name | Storage Size | Description | Representation |
---|---|---|---|
point | 16 bytes | Point on a plane | (x,y) |
line | 32 bytes | Infinite line | {A,B,C} |
lseg | 32 bytes | Finite line segment | ((x1,y1),(x2,y2)) |
box | 32 bytes | Rectangular box | ((x1,y1),(x2,y2)) |
path | 16+16n bytes | Closed path (similar to polygon) | ((x1,y1),...) |
path | 16+16n bytes | Open path | [(x1,y1),...] |
polygon | 40+16n bytes | Polygon (similar to closed path) | ((x1,y1),...) |
circle | 24 bytes | Circle | <(x,y),r> (center point and radius) |
PostgreSQL provides two data types that are designed to support full text search.
tsvector
tsquery
A tsvector
value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word.
SELECT 'The Fat Rats'::tsvector;
tsvector
--------------------
'Fat' 'Rats' 'The'
A tsquery
value stores lexemes that are to be searched for, and can combine them using the Boolean operators &
(AND), |
(OR), and !
(NOT), as well as the phrase search operator <->
(FOLLOWED BY).
SELECT 'fat & (rat | cat)'::tsquery;
tsquery
---------------------------
'fat' & ( 'rat' | 'cat' )
The data type uuid
stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards.
JSON data types are for storing JSON (JavaScript Object Notation) data, as specified in RFC 7159.
-
json
- stores an exact copy of the input text -
jsonb
- stores data in decomposed binary format that makes it slightly slower to input due to added conversion overhead
To implement efficient query mechanisms for these data types, PostgreSQL also provides the jsonpath
data type.
-- Simple scalar/primitive value
-- Primitive values can be numbers, quoted strings, true, false, or null
SELECT '5'::json;
-- Array of zero or more elements (elements need not be of same type)
SELECT '[1, 2, "foo", null]'::json;
-- Object containing pairs of keys and values
-- Note that object keys must always be quoted strings
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;
-- Arrays and objects can be nested arbitrarily
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;
SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
json | jsonb
-----------------------+-------------------------
{"reading": 1.230e-5} | {"reading": 0.00001230}
Testing containment is an important capability of jsonb
.
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb; -- true
SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb; -- true
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb; -- true
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- false
jsonb
has an existence operator: it tests whether a string appears as an object key or array element at the top level:
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar'; -- true
SELECT '{"foo": "bar"}'::jsonb ? 'foo'; -- true
SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- false
SELECT '"foo"'::jsonb ? 'foo'; -- true
GIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number of documents.
CREATE INDEX idxgin ON api USING GIN (jdoc);
-- Find documents in which the key "company" has value "Magnafone"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
-- Find documents in which the key "tags" contains key or array element "qui"
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
GIN index supports @@
and @?
operators, which perform jsonpath matching:
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] ? (@ == "qui")';
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays.
CREATE TABLE sal_emp (
name text,
pay_by_quarter integer[],
schedule text[][]
);
CREATE TABLE tictactoe (
squares integer[3][3]
);
INSERT INTO sal_emp
VALUES ('Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {"training"}}');
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types.
CREATE TYPE inventory_item AS (
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
item inventory_item,
count integer
);
INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);
A domain is a user-defined data type that is based on another underlying type.
- can have constraints that restrict its valid values
CREATE DOMAIN posint AS integer CHECK (VALUE > 0);
CREATE TABLE mytable (id posint);
INSERT INTO mytable VALUES(1); -- works
INSERT INTO mytable VALUES(-1); -- fails
You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table:
CREATE VIEW myview AS
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;
SELECT * FROM myview;
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
It's possible to control the statements in a transaction in a more granular fashion through the use of savepoints:
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
-- oops ... forget that and use Wally's account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Wally';
COMMIT;
A window function performs a calculation across a set of table rows that are somehow related to the current row.
Window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 8 | 6000 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
sales | 4 | 4800 | 4866.6666666666666667
(4 rows)
You can also control the order in which rows are processed by window functions using ORDER BY
within OVER
:
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 10 | 5200 | 2
develop | 11 | 5200 | 2
personnel | 5 | 3500 | 1
personnel | 2 | 3900 | 2
personnel | 9 | 4000 | 3
(6 rows)
For each row, there is a set of rows within its partition called its window frame:
- since there is no
ORDER BY
in theOVER
clause, the window frame is the same as the partition, which for lack ofPARTITION
BY is the whole table.
SELECT salary, sum(salary) OVER () FROM empsalary;
salary | sum
--------+-------
5200 | 47100
5000 | 47100
4800 | 47100
6000 | 47100
5200 | 47100
(5 rows)
If we add an ORDER BY clause, we get very different results:
SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
salary | sum
--------+-------
4800 | 25700
5000 | 30700
5200 | 41100
5200 | 41100
6000 | 47100
(10 rows)
CREATE TABLE cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
CREATE FUNCTION concat_lower_or_upper(a text, b text, uppercase boolean DEFAULT false)
RETURNS text
AS
$$
SELECT CASE
WHEN $3 THEN UPPER($1 || ' ' || $2)
ELSE LOWER($1 || ' ' || $2)
END;
$$
LANGUAGE SQL IMMUTABLE STRICT;
SELECT concat_lower_or_upper('Hello', 'World');
concat_lower_or_upper
-----------------------
hello world
(1 row)
A generated column is a special column that is always computed from other columns.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);
For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2.
T1 CROSS JOIN T2
FROM T1 CROSS JOIN T2
is equivalent to FROM T1 INNER JOIN T2 ON TRUE
. It is also equivalent to FROM T1, T2
.
query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2
Operations can also be nested and chained:
query1 UNION query2 UNION query3
WITH
provides a way to write auxiliary statements for use in a larger query.
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
Using optional RECURSIVE
, a query can refer to its own output:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
CREATE TABLE products (
product_no integer,
name text,
price numeric CHECK (price > 0)
);
You can also give the constraint a separate name:
CREATE TABLE products (
...,
price numeric CONSTRAINT positive_price CHECK (price > 0)
);
A check constraint can also refer to several columns:
CREATE TABLE products ( ..., price numeric CHECK (price > 0), discounted_price numeric CHECK (discounted_price > 0), CHECK (price > discounted_price) );
CREATE TABLE orders (
order_id integer PRIMARY KEY,
product_no integer REFERENCES products (product_no),
quantity integer
);
CREATE TABLE t1 (
a integer PRIMARY KEY,
b integer,
c integer,
FOREIGN KEY (b, c) REFERENCES other_table (c1, c2)
);
CREATE INDEX col1_idx ON table1 (col1);
An index specifically designed to include the columns needed by a particular type of query that you run frequently.
SELECT y FROM tab WHERE x = 'key';
Queries can be handeled as index-only scans, because y
can be obtained from the index without visiting the heap:
CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
Because column y
is not part of the index's search key, it does not have to be of a data type that the index can handle; it's merely stored in the index and is not interpreted by the index machinery.
ALTER TABLE products ADD COLUMN description text;
You can also define constraints on the column at the same time:
ALTER TABLE products ADD COLUMN description text CHECK (description <> '');
ALTER TABLE products DROP COLUMN description;
If the column is referenced by a foreign key constraint of another table, PostgreSQL will not silently drop that constraint. You can authorize dropping everything that depends on the column by adding CASCADE
:
ALTER TABLE products DROP COLUMN description CASCADE;
ALTER TABLE products ADD CHECK (name <> '');
ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;
Adding a not-null constraint, which cannot be written as a table constraint:
ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;
ALTER TABLE products DROP CONSTRAINT some_name;
This works the same for all constraint types except not-null constraints:
ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL;
ALTER TABLE products RENAME COLUMN product_no TO product_number;
ALTER TABLE products RENAME TO items;
- When an object is created, it is assigned an owner.
- The owner is normally the role that executed the creation statement.
- For most kinds of objects, the initial state is that only the owner (or a superuser) can do anything with the object.
- To allow other roles to use it, privileges must be granted.
Privilege | Abbreviation | Applicable Object Types |
---|---|---|
SELECT |
r (“read”) | LARGE OBJECT, SEQUENCE, TABLE (and table-like objects), table column |
INSERT |
a (“append”) | TABLE, table column |
UPDATE |
w (“write”) | LARGE OBJECT, SEQUENCE, TABLE, table column |
DELETE |
d | TABLE |
TRUNCATE |
D | TABLE |
REFERENCES |
x | TABLE, table column |
TRIGGER |
t | TABLE |
CREATE |
C | DATABASE, SCHEMA, TABLESPACE |
CONNECT |
c | DATABASE |
TEMPORARY |
T | DATABASE |
EXECUTE |
X | FUNCTION, PROCEDURE |
USAGE |
U | DOMAIN, FOREIGN DATA WRAPPER, FOREIGN SERVER, LANGUAGE, SCHEMA, SEQUENCE, TYPE |
An object can be assigned to a new owner with an ALTER
command:
ALTER TABLE table_name OWNER TO new_owner;
Superusers can always do this; ordinary roles can only do it if they are both the current owner of the object (or a member of the owning role) and a member of the new owning role.
To assign privileges, the GRANT
command is used:
GRANT UPDATE ON accounts TO joe;
Writing ALL
in place of a specific privilege grants all privileges that are relevant for the object type.
The special “role” name PUBLIC
can be used to grant a privilege to every role on the system.
To revoke a previously-granted privilege, use the REVOKE
command:
REVOKE ALL ON accounts FROM PUBLIC;
Tables can have row security policies that restrict, on a per-user basis, which rows can be returned inserted, or modified.
- Operations that apply to the whole table, such as
TRUNCATE
andREFERENCES
, are not subject to row security.
CREATE TABLE accounts (manager text, company text, contact_email text);
ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
CREATE POLICY account_managers_policy ON accounts TO managers
USING (manager = current_user);
CREATE POLICY account_managers_select_policy ON managers
FOR SELECT
USING (true);
CREATE ROLE name;
createuser name
DROP ROLE name;
dropuser name
GRANT group_role TO role1, ... ;
REVOKE group_role FROM role1, ... ;
-- GRANT admin TO joe;
Only roles that have the LOGIN attribute can be used as the initial role name for a database connection:
CREATE ROLE name LOGIN;
A role with the LOGIN
attribute can be considered the same as a “database user”:
CREATE USER name;
(CREATE USER
is equivalent to CREATE ROLE
except that CREATE USER
includes LOGIN
by default, while CREATE ROLE
does not.)
Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.
CREATE SCHEMA myschema;
CREATE TABLE myschema.mytable (
...
);
DROP SCHEMA myschema; -- if it's empty
DROP SCHEMA myschema CASCADE; -- including all contained objects
Create a schema owned by someone else:
CREATE SCHEMA schema_name AUTHORIZATION user_name;
By default tables (and other objects) are automatically put into a schema named “public”:
CREATE TABLE products ( ... );
-- is equivalent to CREATE TABLE public.products ( ... );
The system determines which table is meant by following a search path, which is a list of schemas to look in.
- Default:
"$user", public
Put our new schema in the path:
SET search_path TO myschema,public;
In addition to public and user-created schemas, each database contains a pg_catalog
schema, which contains the system tables and all the built-in data types, functions, and operators.
-
pg_catalog
is always effectively part of the search path.
Partitioning refers to splitting what is logically one large table into smaller physical pieces.
- Query performance can be improved dramatically in certain situations
- when most of the heavily accessed rows of the table are in a single partition or a small number of partitions
- When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.
- Bulk loads and deletes can be accomplished by adding or removing partitions.
- Seldom-used data can be migrated to cheaper and slower storage media.
Built-in forms of partitioning:
-
Range Partitioning
- The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions.
-
List Partitioning
- The table is partitioned by explicitly listing which key values appear in each partition.
-
Hash Partitioning
- The table is partitioned by specifying a modulus and a remainder for each partition.
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
Implement sub-partitioning, specify the PARTITION BY
clause:
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
PARTITION BY RANGE (peaktemp);
TBD
- GIN (Generalized Inverted Index)-based index.
- GiST (Generalized Search Tree)-based index.
CREATE INDEX name ON table USING GIN (column);
CREATE INDEX name ON table USING GIST (column [ { DEFAULT | tsvector_ops } (siglen = number) ] );
GIN indexes are the preferred text search index type. As inverted indexes, they contain an index entry for each word (lexeme), with a compressed list of matching locations.
- GIN indexes store only the words (lexemes) of tsvector values, and not their weight labels.
A GiST index is lossy, meaning that the index might produce false matches, and it is necessary to check the actual table row to eliminate such false matches.
- Each document is represented in the index by a fixed-length signature.
Internally, data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC).
- Each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data.
- Prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session.
MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.
Some DDL commands, currently only TRUNCATE
and the table-rewriting forms of ALTER TABLE
, are not MVCC-safe.
The phenomena:
- dirty read - A transaction reads data written by a concurrent uncommitted transaction.
- nonrepeatable read - A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
- phantom read - A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
- serialization anomaly - The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
Isolation Level | Dirty Read | Nonrepeatable Read | Phantom Read | Serialization Anomaly |
---|---|---|---|---|
Read uncommitted | Allowed, but not in PG | Possible | Possible | Possible |
Read committed (def) | Not possible | Possible | Possible | Possible |
Repeatable read | Not possible | Not possible | Allowed, but not in PG | Possible |
Serializable | Not possible | Not possible | Not possible | Not possible |
In PostgreSQL, you can request any of the four standard transaction isolation levels, but internally only three distinct isolation levels are implemented, i.e., PostgreSQL's Read Uncommitted mode behaves like Read Committed.
Serializable transactions are just Repeatable Read transactions which add nonblocking monitoring for dangerous patterns of read/write conflicts.
To set the transaction isolation level of a transaction, use the command SET TRANSACTION
.
PostgreSQL devises a query plan for each query it receives.
EXPLAIN
command can be used to see what query plan the planner creates for any query.
The structure of a query plan is a tree of plan nodes.
- Nodes at the bottom level of the tree are scan nodes: they return raw rows from a table.
- Different types of scan nodes for different table access methods:
- sequential scans,
- index scans, and
- bitmap index scans.
- If the query requires joining, aggregation, sorting, or other operations on the raw rows, then there will be additional nodes above the scan nodes to perform these operations.
- Different types of scan nodes for different table access methods:
- The output of
EXPLAIN
has one line for each node in the plan tree, showing the basic node type plus the cost estimates that the planner made for the execution of that plan node.
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
- estimated start-up cost
- estimated total cost
- estimated number of rows output
- estimated average width of rows output
- https://www.postgresql.org/docs/13/transaction-iso.html
- https://www.postgresql.org/docs/13/using-explain.html
- https://www.postgresql.org/docs/13/reference-client.html
- https://www.postgresql.org/docs/13/reference-server.html
- The computer science behind Postgres indexes
- Index and Table Access & Join Operations
- Access and Filter-Predicates
- Eliminating PostgreSQL subtransactions at GitLab