postgres declarative partitioning - ghdrako/doc_snipets GitHub Wiki

PostgreSQL does not support join between partition-leafs and non-partition tables before merging.

PostgreSQL supports three types of declarative partitioning:

  • Range partitioning
  • List partitioning
  • Hash partitioning

We can create global primary key constraint for partitioned table that is passed down to all partitions. The pk must include all partitioned columns.

Index created on partitioned table will be created automatically for all partitions. such indexes are still physically separated indexes but cat't be removed for specyfic partition.

Unique index support global uniquess but must include partition key

Declarative partitioning

Create partition table

List partitioning

  1. Sanity check
DROP TABLE IF EXISTS part_tags cascade;
  1. create our parent table
CREATE TABLE part_tags (
  pk INTEGER NOT NULL DEFAULT nextval('part_tags_pk_seq') ,
  level INTEGER NOT NULL DEFAULT 0,
  tag VARCHAR (255) NOT NULL,
  primary key (pk,level)
)
PARTITION BY LIST (level);

Note is that the field used to partition the data must be part of the primary key.

  1. Define the child tables
CREATE TABLE part_tags_level_0 PARTITION OF part_tags FOR VALUES IN
(0);
CREATE TABLE part_tags_level_1 PARTITION OF part_tags FOR VALUES IN
(1);
CREATE TABLE part_tags_level_2 PARTITION OF part_tags FOR VALUES IN
(2);
CREATE TABLE part_tags_level_3 PARTITION OF part_tags FOR VALUES IN
(3);
 CREATE INDEX part_tags_tag on part_tags using GIN (tag gin_trgm_ops);
\d part_tags;
\d part_tags_level_0;
select * from part_tags;
select * from part_tags_level_0;
select * from part_tags_level_1;

Range partitioning

DROP TABLE IF EXISTS part_tags cascade;

CREATE TABLE part_tags (
pk INTEGER NOT NULL DEFAULT nextval('part_tags_pk_seq'),
ins_date date not null default now()::date,
tag VARCHAR (255) NOT NULL,
level INTEGER NOT NULL DEFAULT 0,
primary key (pk,ins_date)
)
PARTITION BY RANGE (ins_date);

CREATE TABLE part_tags_date_01_2020 PARTITION OF part_tags FOR
VALUES FROM ('2020-01-01') TO ('2020-01-31');
CREATE TABLE part_tags_date_02_2020 PARTITION OF part_tags FOR
VALUES FROM ('2020-02-01') TO ('2020-02-28');
CREATE TABLE part_tags_date_03_2020 PARTITION OF part_tags FOR
VALUES FROM ('2020-03-01') TO ('2020-03-31');
CREATE TABLE part_tags_date_04_2020 PARTITION OF part_tags FOR
VALUES FROM ('2020-04-01') TO ('2020-04-30')
CREATE INDEX part_tags_tag on part_tags using GIN (tag gin_trgm_ops);

 \d part_tags;
 \d part_tags_date_01_2020;
 select * from part_tags;
 select * from part_tags_date_01_2020;
 select * from part_tags_date_02_2020;
 select * from part_tags_date_03_2020;

Example 2

  1. Create parent payments table:
CREATE TABLE partitioned_payments(
     id serial not null,
     created_at timestamp NOT NULL,
     updated_at timestamp NOT NULL,
     amount double precision NOT NULL,
     total double precision NOT NULL,
     description json DEFAULT NULL,
     tax double precision NOT NULL,
     customer_id integer NOT NULL,
     shipping_address text DEFAULT NULL,
     status varchar(255) DEFAULT NULL
) PARTITION BY RANGE (created_at);

Example 3

CREATE TABLE data (
   payload  integer
 )  PARTITION BY RANGE (payload);
CREATE TABLE negatives PARTITION
  OF data FOR VALUES FROM (MINVALUE) TO (0);
CREATE TABLE positives PARTITION
  OF data FOR VALUES FROM (0) TO (MAXVALUE);
INSERT INTO data VALUES (5);
 SELECT * FROM positives;
--move between partition
UPDATE data
              SET     payload = -10
              WHERE   payload = 5
              RETURNING *;
 SELECT * FROM negatives;
 DROP TABLE negatives
 CREATE TABLE p_def PARTITION OF data DEFAULT; 
  1. Create partitions. It makes sense to create a few more partitions in advance::
CREATE TABLE payments_2020_1_1_7 PARTITION OF partitioned_payments
    FOR VALUES FROM ('2020-01-01') TO ('2020-01-07 23:59:59');

CREATE TABLE payments_2020_1_8_15 PARTITION OF partitioned_payments
    FOR VALUES FROM ('2020-01-08') TO ('2020-01-15 23:59:59');

CREATE TABLE payments_2020_1_16_23 PARTITION OF partitioned_payments
    FOR VALUES FROM ('2020-01-16') TO ('2020-01-23 23:59:59');
  1. Create indexes:
// for partition key columns
CREATE INDEX payments_2020_1_1_7_created_at_idx ON payments_2020_1_1_7(created_at);
CREATE INDEX payments_2020_1_8_15_created_at_idx ON payments_2020_1_8_15(created_at);
CREATE INDEX payments_2020_1_16_23_created_at_idx ON payments_2020_1_16_23(created_at);

// for id column
CREATE INDEX payments_2020_1_1_7_id_idx ON payments_2020_1_1_7(id);
CREATE INDEX payments_2020_1_8_15_id_idx ON payments_2020_1_8_15(id);
CREATE INDEX payments_2020_1_16_23_id_idx ON payments_2020_1_16_23(id);
  1. Schedule for partition creation. As our last partition will have rows till 23 of Jan, 2020 it means that on 23 of Jan we should create a new partitions for new rows having created_at greater then 23 of Jan, 2020
0 0 23 * * /project_dir/create_payments_partitions.sh

where create_payments_partitions.sh:

#!/bin/sh

year=$(date +%Y)
month=$(date +%m)
next_year=$(date -d "+1 months" +%Y)
next_month=$(date -d "+1 months" +%m)
month_last_day=$(date -d "`date +%Y%m01` +1 month -1 day" +%d)

sql=$(sed "s/|year|/$year/g" ./declarative_new_partitions_template.sql |
      sed "s/|month|/$month/g" | sed "s/|next_month|/$next_month/g" |
      sed "s/|next_year|/$next_year/g" |
      sed "s/|month_last_day|/$month_last_day/g"
)
docker exec -it postgres-13.10 psql -U postgres partitioning_demo -c "$sql"

and declarative_new_partitions_template.sql :

CREATE TABLE payments_|year|_|month|_24_|month_last_day| PARTITION OF partitioned_payments
    FOR VALUES FROM ('|year|-|month|-24') TO ('|year|-|month|-|month_last_day| 23:59:59');

CREATE TABLE payments_|next_year|_|next_month|_1_7 PARTITION OF partitioned_payments
    FOR VALUES FROM ('|next_year|-|next_month|-01') TO ('|next_year|-|next_month|-07 23:59:59');

CREATE TABLE payments_|next_year|_|next_month|_8_15 PARTITION OF partitioned_payments
    FOR VALUES FROM ('|next_year|-|next_month|-08') TO ('|next_year|-|next_month|-15 23:59:59');

CREATE TABLE payments_|next_year|_|next_month|_16_23 PARTITION OF partitioned_payments
    FOR VALUES FROM ('|next_year|-|next_month|-16') TO ('|next_year|-|next_month|-23 23:59:59');

And you can test

INSERT INTO partitioned_payments (created_at, updated_at, amount, total, description, tax, customer_id, shipping_address, status)
SELECT
    date('2020-01-' || random_between(1,7)),
    NOW() - '1 year'::INTERVAL * ROUND(RANDOM() * 100),
    ROUND(random() * 1000)::double precision ,
    ROUND(random() * 1000)::double precision ,
    '{}',
    ROUND(random() * 1000)::double precision,
    ROUND(RANDOM() * 1000)::INTEGER,
    SUBSTRING(md5(random()::text), 0, 25),
    SUBSTRING(md5(random()::text), 0, 100)
FROM generate_series(1, 20);

and check the explain for the select:

EXPLAIN (analyze,timing off) SELECT * FROM partitioned_payments WHERE created_at BETWEEN '2020-01-01' AND '2020-01-07' ORDER BY id asc LIMIT 10 OFFSET 100;

In the execution plan we can see:

Scan on partitioned_payments...
 ...
Index Scan using payments_2020_1_1_7_created_at_idx on payments_2020_1_1_7 partitioned_payments  (cost=0.00..8.27 rows=1 width=628)

So only partitioned_payments and payments_2020_1_1_7 tables are being accessed to find the rows.

Hash partitioning

  1. Create parent table:
CREATE TABLE partitioned_payments_hash(
      id serial not null,
      created_at timestamp NOT NULL,
      updated_at timestamp NOT NULL,
      amount double precision NOT NULL,
      total double precision NOT NULL,
      description json DEFAULT NULL,
      tax double precision NOT NULL,
      customer_id integer NOT NULL,
      shipping_address text DEFAULT NULL,
      status varchar(255) DEFAULT NULL
) PARTITION BY HASH (customer_id);
  1. Create partitions: The goal is to have 12 partitions, so we use the hash value and do a modulo 12. The remainder will identify the partition
CREATE TABLE payments_1 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 0);

CREATE TABLE payments_2 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 1);

CREATE TABLE payments_3 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 2);
    
CREATE TABLE payments_4 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 3);

CREATE TABLE payments_5 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 4);

CREATE TABLE payments_6 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 5);

CREATE TABLE payments_7 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 6);

CREATE TABLE payments_8 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 7);

CREATE TABLE payments_9 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 8);

CREATE TABLE payments_10 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 9);

CREATE TABLE payments_11 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 10);

CREATE TABLE payments_12 PARTITION OF partitioned_payments_hash
    FOR VALUES WITH (modulus 12, remainder 11);
  1. Create indexes for partition key column in each partition:
CREATE INDEX payments_1_idx ON payments_1 (customer_id);
CREATE INDEX payments_2_idx ON payments_2 (customer_id);
CREATE INDEX payments_3_idx ON payments_3 (customer_id);
CREATE INDEX payments_4_idx ON payments_4 (customer_id);
CREATE INDEX payments_5_idx ON payments_5 (customer_id);
CREATE INDEX payments_6_idx ON payments_6 (customer_id);
CREATE INDEX payments_7_idx ON payments_7 (customer_id);
CREATE INDEX payments_8_idx ON payments_8 (customer_id);
CREATE INDEX payments_9_idx ON payments_9 (customer_id);
CREATE INDEX payments_10_idx ON payments_10 (customer_id);
CREATE INDEX payments_11_idx ON payments_11 (customer_id);
CREATE INDEX payments_12_idx ON payments_12 (customer_id);

Now you can insert rows:

INSERT INTO partitioned_payments_hash (created_at, updated_at, amount, total, description, tax, customer_id, shipping_address, status)
SELECT
    date('2023-02-' || random_between(1,28)),
    NOW() - '1 year'::INTERVAL * ROUND(RANDOM() * 100),
    ROUND(random() * 1000)::double precision ,
    ROUND(random() * 1000)::double precision ,
    '{}',
    ROUND(random() * 1000)::double precision,
    ROUND(RANDOM() * 100000)::INTEGER,
    SUBSTRING(md5(random()::text), 0, 25),
    SUBSTRING(md5(random()::text), 0, 255)
FROM generate_series(1, 10000);

test the query performance:

EXPLAIN SELECT * FROM partitioned_payments_hash WHERE customer_id = 47209;
QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Bitmap Heap Scan on payments_5 partitioned_payments_hash  (cost=4.29..10.12 rows=2 width=109)
   Recheck Cond: (customer_id = 47209)
   ->  Bitmap Index Scan on payments_5_idx  (cost=0.00..4.29 rows=2 width=0)
         Index Cond: (customer_id = 47209)
(4 rows)

Maintenance partition

List partition

use the pg_partition_tree function to get the list of partition names in the table:

# Verify the partitioned tab
SELECT p.relname AS partition_name,  
       pg_get_expr(p.relpartbound, p.oid) AS partition_bound, 
       pg_get_partkeydef(p.oid) AS partition_key  
  FROM pg_partition_tree('sales') AS t  
  JOIN pg_class AS p ON p.oid = t.relid;

Attach a new partition

create the new regular table with the desired schema and partition constraints, and then use the ALTER TABLE command with the ATTACH PARTITION option to attach it to the partitioned table.

CREATE TABLE part_tags_date_05_2020 PARTITION OF part_tags FOR VALUES FROM ('2020-05-01') TO ('2020-05-30');

Detach an existing partition

 ALTER TABLE part_tags DETACH PARTITION part_tags_date_05_2020 ;
ALTER TABLE t_data
    DETACH PARTITION payments_3;
ALTER TABLE t_data
    ATTACH PARTITION payments_3
    FOR VALUES WITH (MODULUS 12, REMAINDER 3);

Attach an already existing table to the parent table

ALTER TABLE part_tags ATTACH PARTITION part_tags_already_exists FOR
VALUES FROM ('1970-01-01') TO ('2019-12-31');

Constraint exclusion

Set parameter constraint exclusion to on or partition. If WHERE clause contain partition key optimizer using check constraint to remove partition from scaning

select * from pg_setting where name = 'constraint_exclusion';

Partition pruning

It's use only in declarative partitioning. Set parameter partition pruning to enable (default). If WHERE clause contain partition key optimizer using table partition bounds to remove partition from scaning. This bounds exist only in declarative partition.

select * from pg_setting where name = 'enable_partition_pruning';

enable_partition_pruning = off/on

Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time; for example, parameters defined in a PREPARE statement, using a value obtained from a subquery or using a parameterized value on the inner side of a nested loop join.

Default partition

use a default partition where all the values that are not reflected in the mapping of the child tables will be inserted.

CREATE TABLE part_tags_default PARTITION OF part_tags default;

Indexing

Postgres 11+ If define the indexes for the parent table. These indexes will automatically be propagated to child tablesa.

Moving rów between partitios

PostgreSQL 11+ is able to move a row from one partition to the other. However, keep in mind that moving data between partitions may not be the best idea in general.

Default partition

CREATE TABLE p_def PARTITION OF data DEFAULT;

All the data that doesn’t fit anywhere will end up in this default partition, which ensures that creating the right partition can never be forgotten.