database performance - ORGTYTESPOO/kiradigiasm GitHub Wiki

Why?

We need to ensure that the data structures enable efficient queries. Otherwise people using apps and reports wait longer for the data to load.
There's a trade-off between efficient reporting and operative use. To find out the real world need we held user workshops with different user groups. First we created profiles for different groups i.e. everyday users, everyday management, higher level of management etc. Then we interviewed chosen people from each group together and separately and wrote user stories. From the user stories we were able to deduce the technical use cases.
Listing the most common use cases we can deduce how to design the db and what to emphasize.

1.Most common use cases (or user stories)

User story	Use case	used for	Data has to be updated	How many assets involved
I order area maintenance based on square metres	Square metres of areas	reporting	once a night	2000 - 60 000 assets
I need to change information in the field if I change a traffic sign	Updating equipment	operative	instantly	max. 500 assets/day
I add new trash bins	Inserting equipment	operative	instantly	ca.10 assets/day/year
When a new park or a road is built we plant tens of trees, bushes and flowers	Inserting vegetation	operative	instantly	ca. 10 assets/day/year
The sign was no longer needed so it was removed	Deleting equipment	operative	instantly	max. 10 assets/day
The trees were planted too close to each other so one was cut down	Deleting vegetation	operative	instantly	max. 10 assets/day
I need yearly reports for the budget management	Calculating length of streets and pavements grouped by the surface material	reporting	once a night	2000-60 000 assets
I need to make playground safety checks	Playground Equipment lists grouped by location, address, model	reporting	once a night	max. 50 assets
I need to be able to provide equipment lists for the contractor	Equipment lists grouped by location, address, model	reporting	once a night	ca. 10 000 - 30 000 assets
I want to keep track of all the pavings that we do so that we know what material was used beneath the top surface layer and how old the layers are	Inserting surface layers	operative	instantly	?
We order a traffic sign pattern recognition data yearly	Inserting collected data mass of traffic signs	DW	once	?

2.Example of a performace test in PostgreSQL - Reporting areas

2.1. Separate area tables for streets and green areas

EXPLAIN ANALYZE
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM street
GROUP BY maintenanceclasstext
UNION
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM green
GROUP BY maintenanceclasstext
;

HashAggregate (cost=20264.24..20264.45 rows=21 width=546) (actual time=1092.138..1092.143 rows=23 loops=1)
Group Key: (sum(st_area(street.geom))), street.maintenanceclasstext
-> Append (cost=4642.20..20264.13 rows=21 width=546) (actual time=69.287..1092.118 rows=23 loops=1)
-> HashAggregate (cost=4642.20..4642.27 rows=7 width=862) (actual time=69.286..69.287 rows=8 loops=1)
Group Key: street.maintenanceclasstext

          `->  Seq Scan on street  (cost=0.00..4429.54 rows=28354 width=862) (actual time=0.006..9.192 rows=28354 loops=1)`

    `->  HashAggregate  (cost=15621.52..15621.66 rows=14 width=388) (actual time=1022.816..1022.820 rows=15 loops=1)`

          `Group Key: green.maintenanceclasstext`

          `->  Seq Scan on green  (cost=0.00..14279.01 rows=179001 width=388) (actual time=0.026..649.096 rows=179001 loops=1)`

Planning time: 0.081 ms
Execution time: 1092.224 ms

2.2 One table for all the areas

EXPLAIN ANALYZE
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM assetarea
GROUP BY maintenanceclasstext
;

HashAggregate (cost=17852.80..17853.00 rows=20 width=463) (actual time=439.457..439.464 rows=21 loops=1)
Group Key: maintenanceclasstext
-> Seq Scan on assetarea (cost=0.00..16483.60 rows=182560 width=463) (actual time=0.025..78.140 rows=182560 loops=1)
Planning time: 0.045 ms
Execution time: 439.510 ms

2.3 Statistics

Table count	Planning time ms	Execution time ms
2	0.081	1092.224
1	0.045	439.510

We can see that reporting is faster if we keep all the areas in the same table.

3. Summary

Things considered.

How many apps and other software are using the database at the same time?
Data integrity but still fast queries.
Conclusion: the number of different connections updating, inserting and deleting rows in the db is so large that operative design has to be emphasized on the cost of reporting.

3.1 Inserting efficiency

The more indexes the slower the insert

3.1.1 Inserting one to 30 rows at a time

Done regularly and has to be considered. Index has no or very little effect on efficiency.

3.1.2 Inserting large amounts of data

Done once a day/week/month/year. These ETL -transfers are done at night time, scheduled and automated. More efficient if index is temporarily dropped during the data transfer. Can't take benefit from adding an entry to an index.

3.2 Deleting efficiency

3.2.1 Deleting one to 30 rows at a time

Done regularly. Index has no or very little effect on efficiency.

3.2.2 Deleting large amounts of data

Done once a day/week/month/year as part of the ETL. Very slow if DELETE is used. Fast if TRUNCATE is used. This is because in DELETE each row is recorded and removed physically whereas in TRUNCATE information rows still actually exist in the information pages but extensions are marked for reuse. Using TRUNCATE means that before truncating all restrictions need to be removed and then reapplied.

3.3 Updating efficiency

3.3.1 Updating one to 30 rows at a time

Done regularly. Index has no or very little effect on efficiency.

3.3.2 Updating large amounts of data

3.4 Query efficiency

A query for an application targets a subset like certain equipment group that is then viewed, modified etc.
Large data sets are more costly than small. Adding an index and keeping the data in as few tables as possible make the queries faster.
A big emphasis must be put on how the queries are built to see that they are efficient and fast.

3.4 Statistics

Type	Tested	Slow	Fast
Query	Data	large set	small set
Query	Index	no index	has index
Query	Joins	used joins	no joins
Insert	Data	large set	small set
Insert	Index	if used	drop index
Delete	Data	large set	small set
Delete	Index	no effect	no effect
Update	Data	large set	small set
Update	Index	no effect	no effect