postgres batch processing - ghdrako/doc_snipets GitHub Wiki
- https://stackoverflow.com/questions/7019831/bulk-batch-update-upsert-in-postgresql
- https://www.postgresql.org/docs/current/sql-values.html
- https://www.postgresql.org/docs/current/sql-update.html
- https://www.postgresql.org/message-id/[email protected]
- https://2ndquadrant.github.io/postgres/libpq-batch-mode.html
INSERT INTO "table" (col1, col2, col3)
VALUES (11, 12, 13) , (21, 22, 23) , (31, 32, 33);
NSERT INTO "table" (col1, col2, col3)
VALUES (unnest(array[11,21,31]),
unnest(array[12,22,32]),
unnest(array[13,23,33]))
INSERT INTO "table" (col1, col2, col3)
VALUES (unnest(?), unnest(?), unnest(?))
You have to pass arrays or lists as arguments to this query. This means you can do huge bulk inserts without doing string concatenation (and all its hazzles and dangers: sql injection and quoting hell).
update "table"
set value = data_table.new_value
from
(select unnest(?) as key, unnest(?) as new_value) as data_table
where "table".key = data_table.key;
create table tmp
(
id serial not null primary key,
name text,
age integer
);
insert into tmp (name,age)
values ('keith', 43),('leslie', 40),('bexley', 19),('casey', 6);
update tmp set age = data_table.age
from
(select unnest(array['keith', 'leslie', 'bexley', 'casey']) as name,
unnest(array[44, 50, 10, 12]) as age) as data_table
where tmp.name = data_table.name;
UPDATE employees SET salary = salary * v.increase
FROM (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase)
WHERE employees.depno = v.depno AND employees.sales >= v.target;
- Generate SQL statements on the fly, concatenate them with semicolons, and then submit the statements in one shot. I've done up to 100 inserts in this way, and it was quite efficient (done against Postgres).
- JDBC has batching capabilities built in, if configured. If you generate transactions, you can flush your JDBC statements so that they transact in one shot. This tactic requires fewer database calls, as the statements are all executed in one batch.
- Hibernate also supports JDBC batching along the lines of the previous example, but in this case you execute a flush() method against the Hibernate Session, not the underlying JDBC connection. It accomplishes the same thing as JDBC batching.
Incidentally, Hibernate also supports a batching strategy in collection fetching. If you annotate a collection with @BatchSize, when fetching associations, Hibernate will use IN instead of =, leading to fewer SELECT statements to load up the collections.
Change Target Table to Un-logged Mode
For PostgreSQL 9.5 and above, the target table can be first altered to UNLOGGED, then altered back to LOGGED once the data is loaded:
ALTER TABLE <target table> SET UNLOGGED
<bulk data insert operations…>
ALTER TABLE <target table> LOGGED
Disable trigger
ALTER TABLE <target table> DISABLE TRIGGER ALL
<bulk data insert operations…>
ALTER TABLE <target table> ENABLE TRIGGER ALL
Drop and recreate index and forein key
DROP INDEX <index_name1>, <index_name2> … <index_name_n>
<bulk data insert operations…>
CREATE INDEX <index_name> ON <target_table>(column1, …,column n)
ALTER TABLE <target_table>
DROP CONSTRAINT <foreign_key_constraint>
BEGIN TRANSACTION
<bulk data insert operations…>
COMMIT
ALTER TABLE <target_table>
ADD CONSTRAINT <foreign key constraint>
FOREIGN KEY (<foreign_key_field>)
REFERENCES <parent_table>(<primary key field>)...
increasing the maintenance_work_mem
configuration parameter can improve the performance of recreating foreign key constraints.