postgres batch processing - ghdrako/doc_snipets GitHub Wiki

Bulk inserts

INSERT INTO "table" (col1, col2, col3)
  VALUES (11, 12, 13) , (21, 22, 23) , (31, 32, 33);
NSERT INTO "table" (col1, col2, col3)
  VALUES (unnest(array[11,21,31]), 
          unnest(array[12,22,32]), 
          unnest(array[13,23,33]))
INSERT INTO "table" (col1, col2, col3)
  VALUES (unnest(?), unnest(?), unnest(?))

You have to pass arrays or lists as arguments to this query. This means you can do huge bulk inserts without doing string concatenation (and all its hazzles and dangers: sql injection and quoting hell).

Bulk update

update "table" 
  set value = data_table.new_value
  from 
    (select unnest(?) as key, unnest(?) as new_value) as data_table
  where "table".key = data_table.key;
create table tmp
(
  id serial not null primary key,
  name text,
  age integer
);

insert into tmp (name,age) 
values ('keith', 43),('leslie', 40),('bexley', 19),('casey', 6);

update tmp set age = data_table.age
from
(select unnest(array['keith', 'leslie', 'bexley', 'casey']) as name, 
        unnest(array[44, 50, 10, 12]) as age) as data_table
where tmp.name = data_table.name;
UPDATE employees SET salary = salary * v.increase
  FROM (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase)
  WHERE employees.depno = v.depno AND employees.sales >= v.target;

3 strategies for batch transactional work:

  • Generate SQL statements on the fly, concatenate them with semicolons, and then submit the statements in one shot. I've done up to 100 inserts in this way, and it was quite efficient (done against Postgres).
  • JDBC has batching capabilities built in, if configured. If you generate transactions, you can flush your JDBC statements so that they transact in one shot. This tactic requires fewer database calls, as the statements are all executed in one batch.
  • Hibernate also supports JDBC batching along the lines of the previous example, but in this case you execute a flush() method against the Hibernate Session, not the underlying JDBC connection. It accomplishes the same thing as JDBC batching.

Incidentally, Hibernate also supports a batching strategy in collection fetching. If you annotate a collection with @BatchSize, when fetching associations, Hibernate will use IN instead of =, leading to fewer SELECT statements to load up the collections.

Un-logged Mode

Change Target Table to Un-logged Mode

For PostgreSQL 9.5 and above, the target table can be first altered to UNLOGGED, then altered back to LOGGED once the data is loaded:

ALTER TABLE <target table> SET UNLOGGED
<bulk data insert operations…>
ALTER TABLE <target table> LOGGED

Good practice

Disable trigger

ALTER TABLE <target table> DISABLE TRIGGER ALL
<bulk data insert operations…>
ALTER TABLE <target table> ENABLE TRIGGER ALL

Drop and recreate index and forein key

DROP INDEX <index_name1>, <index_name2> … <index_name_n>
<bulk data insert operations…>
CREATE INDEX <index_name> ON <target_table>(column1, …,column n)
ALTER TABLE <target_table> 
    DROP CONSTRAINT <foreign_key_constraint>

BEGIN TRANSACTION
    <bulk data insert operations…>
COMMIT

ALTER TABLE <target_table> 
    ADD CONSTRAINT <foreign key constraint>  
    FOREIGN KEY (<foreign_key_field>) 
    REFERENCES <parent_table>(<primary key field>)...

increasing the maintenance_work_mem configuration parameter can improve the performance of recreating foreign key constraints.

⚠️ **GitHub.com Fallback** ⚠️