database performance - ORGTYTESPOO/kiradigiasm GitHub Wiki

Why?

We need to ensure that the data structures enable efficient queries. Otherwise people using apps and reports wait longer for the data to load.
There's a trade-off between efficient reporting and operative use. To find out the real world need we held user workshops with different user groups. First we created profiles for different groups i.e. everyday users, everyday management, higher level of management etc. Then we interviewed chosen people from each group together and separately and wrote user stories. From the user stories we were able to deduce the technical use cases.
Listing the most common use cases we can deduce how to design the db and what to emphasize.

1.Most common use cases (or user stories)

User story Use case used for Data has to be updated How many assets involved
I order area maintenance based on square metres Square metres of areas reporting once a night 2000 - 60 000 assets
I need to change information in the field if I change a traffic sign Updating equipment operative instantly max. 500 assets/day
I add new trash bins Inserting equipment operative instantly ca.10 assets/day/year
When a new park or a road is built we plant tens of trees, bushes and flowers Inserting vegetation operative instantly ca. 10 assets/day/year
The sign was no longer needed so it was removed Deleting equipment operative instantly max. 10 assets/day
The trees were planted too close to each other so one was cut down Deleting vegetation operative instantly max. 10 assets/day
I need yearly reports for the budget management Calculating length of streets and pavements grouped by the surface material reporting once a night 2000-60 000 assets
I need to make playground safety checks Playground Equipment lists grouped by location, address, model reporting once a night max. 50 assets
I need to be able to provide equipment lists for the contractor Equipment lists grouped by location, address, model reporting once a night ca. 10 000 - 30 000 assets
I want to keep track of all the pavings that we do so that we know what material was used beneath the top surface layer and how old the layers are Inserting surface layers operative instantly ?
We order a traffic sign pattern recognition data yearly Inserting collected data mass of traffic signs DW once ?

2.Example of a performace test in PostgreSQL - Reporting areas

2.1. Separate area tables for streets and green areas

  1. EXPLAIN ANALYZE
  2. SELECT SUM(ST_Area(geom)) AS area,
  3. maintenanceclasstext
  4. FROM street
  5. GROUP BY maintenanceclasstext
  6. UNION
  7. SELECT SUM(ST_Area(geom)) AS area,
  8. maintenanceclasstext
  9. FROM green
  10. GROUP BY maintenanceclasstext
  11. ;
  • HashAggregate (cost=20264.24..20264.45 rows=21 width=546) (actual time=1092.138..1092.143 rows=23 loops=1)
  • Group Key: (sum(st_area(street.geom))), street.maintenanceclasstext
  • -> Append (cost=4642.20..20264.13 rows=21 width=546) (actual time=69.287..1092.118 rows=23 loops=1)
  • -> HashAggregate (cost=4642.20..4642.27 rows=7 width=862) (actual time=69.286..69.287 rows=8 loops=1)
  • Group Key: street.maintenanceclasstext
  •           `->  Seq Scan on street  (cost=0.00..4429.54 rows=28354 width=862) (actual time=0.006..9.192 rows=28354 loops=1)`
    
  •     `->  HashAggregate  (cost=15621.52..15621.66 rows=14 width=388) (actual time=1022.816..1022.820 rows=15 loops=1)`
    
  •           `Group Key: green.maintenanceclasstext`
    
  •           `->  Seq Scan on green  (cost=0.00..14279.01 rows=179001 width=388) (actual time=0.026..649.096 rows=179001 loops=1)`
    

Planning time: 0.081 ms
Execution time: 1092.224 ms

2.2 One table for all the areas

  1. EXPLAIN ANALYZE
  2. SELECT SUM(ST_Area(geom)) AS area,
  3. maintenanceclasstext
  4. FROM assetarea
  5. GROUP BY maintenanceclasstext
  6. ;
  • HashAggregate (cost=17852.80..17853.00 rows=20 width=463) (actual time=439.457..439.464 rows=21 loops=1)
  • Group Key: maintenanceclasstext
  • -> Seq Scan on assetarea (cost=0.00..16483.60 rows=182560 width=463) (actual time=0.025..78.140 rows=182560 loops=1)
    Planning time: 0.045 ms
    Execution time: 439.510 ms

2.3 Statistics

Table count Planning time ms Execution time ms
2 0.081 1092.224
1 0.045 439.510
  • We can see that reporting is faster if we keep all the areas in the same table.

3. Summary

Things considered.

  • How many apps and other software are using the database at the same time?
  • Data integrity but still fast queries.
    Conclusion: the number of different connections updating, inserting and deleting rows in the db is so large that operative design has to be emphasized on the cost of reporting.

3.1 Inserting efficiency

The more indexes the slower the insert

3.1.1 Inserting one to 30 rows at a time

Done regularly and has to be considered. Index has no or very little effect on efficiency.

3.1.2 Inserting large amounts of data

Done once a day/week/month/year. These ETL -transfers are done at night time, scheduled and automated. More efficient if index is temporarily dropped during the data transfer. Can't take benefit from adding an entry to an index.

3.2 Deleting efficiency

3.2.1 Deleting one to 30 rows at a time

Done regularly. Index has no or very little effect on efficiency.

3.2.2 Deleting large amounts of data

Done once a day/week/month/year as part of the ETL. Very slow if DELETE is used. Fast if TRUNCATE is used. This is because in DELETE each row is recorded and removed physically whereas in TRUNCATE information rows still actually exist in the information pages but extensions are marked for reuse. Using TRUNCATE means that before truncating all restrictions need to be removed and then reapplied.

3.3 Updating efficiency

3.3.1 Updating one to 30 rows at a time

Done regularly. Index has no or very little effect on efficiency.

3.3.2 Updating large amounts of data

3.4 Query efficiency

A query for an application targets a subset like certain equipment group that is then viewed, modified etc.
Large data sets are more costly than small. Adding an index and keeping the data in as few tables as possible make the queries faster.
A big emphasis must be put on how the queries are built to see that they are efficient and fast.

3.4 Statistics

Type Tested Slow Fast
Query Data large set small set
Query Index no index has index
Query Joins used joins no joins
Insert Data large set small set
Insert Index if used drop index
Delete Data large set small set
Delete Index no effect no effect
Update Data large set small set
Update Index no effect no effect