database performance - ORGTYTESPOO/kiradigiasm GitHub Wiki
Why?
We need to ensure that the data structures enable efficient queries. Otherwise people using apps and reports wait longer for the data to load.
There's a trade-off between efficient reporting and operative use. To find out the real world need we held user workshops with different user groups. First we created profiles for different groups i.e. everyday users, everyday management, higher level of management etc. Then we interviewed chosen people from each group together and separately and wrote user stories. From the user stories we were able to deduce the technical use cases.
Listing the most common use cases we can deduce how to design the db and what to emphasize.
1.Most common use cases (or user stories)
User story | Use case | used for | Data has to be updated | How many assets involved |
---|---|---|---|---|
I order area maintenance based on square metres | Square metres of areas | reporting | once a night | 2000 - 60 000 assets |
I need to change information in the field if I change a traffic sign | Updating equipment | operative | instantly | max. 500 assets/day |
I add new trash bins | Inserting equipment | operative | instantly | ca.10 assets/day/year |
When a new park or a road is built we plant tens of trees, bushes and flowers | Inserting vegetation | operative | instantly | ca. 10 assets/day/year |
The sign was no longer needed so it was removed | Deleting equipment | operative | instantly | max. 10 assets/day |
The trees were planted too close to each other so one was cut down | Deleting vegetation | operative | instantly | max. 10 assets/day |
I need yearly reports for the budget management | Calculating length of streets and pavements grouped by the surface material | reporting | once a night | 2000-60 000 assets |
I need to make playground safety checks | Playground Equipment lists grouped by location, address, model | reporting | once a night | max. 50 assets |
I need to be able to provide equipment lists for the contractor | Equipment lists grouped by location, address, model | reporting | once a night | ca. 10 000 - 30 000 assets |
I want to keep track of all the pavings that we do so that we know what material was used beneath the top surface layer and how old the layers are | Inserting surface layers | operative | instantly | ? |
We order a traffic sign pattern recognition data yearly | Inserting collected data mass of traffic signs | DW | once | ? |
2.Example of a performace test in PostgreSQL - Reporting areas
2.1. Separate area tables for streets and green areas
EXPLAIN ANALYZE
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM street
GROUP BY maintenanceclasstext
UNION
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM green
GROUP BY maintenanceclasstext
;
HashAggregate (cost=20264.24..20264.45 rows=21 width=546) (actual time=1092.138..1092.143 rows=23 loops=1)
Group Key: (sum(st_area(street.geom))), street.maintenanceclasstext
-> Append (cost=4642.20..20264.13 rows=21 width=546) (actual time=69.287..1092.118 rows=23 loops=1)
-> HashAggregate (cost=4642.20..4642.27 rows=7 width=862) (actual time=69.286..69.287 rows=8 loops=1)
Group Key: street.maintenanceclasstext
-
`-> Seq Scan on street (cost=0.00..4429.54 rows=28354 width=862) (actual time=0.006..9.192 rows=28354 loops=1)`
-
`-> HashAggregate (cost=15621.52..15621.66 rows=14 width=388) (actual time=1022.816..1022.820 rows=15 loops=1)`
-
`Group Key: green.maintenanceclasstext`
-
`-> Seq Scan on green (cost=0.00..14279.01 rows=179001 width=388) (actual time=0.026..649.096 rows=179001 loops=1)`
Planning time: 0.081 ms
Execution time: 1092.224 ms
2.2 One table for all the areas
EXPLAIN ANALYZE
SELECT SUM(ST_Area(geom)) AS area,
maintenanceclasstext
FROM assetarea
GROUP BY maintenanceclasstext
;
HashAggregate (cost=17852.80..17853.00 rows=20 width=463) (actual time=439.457..439.464 rows=21 loops=1)
Group Key: maintenanceclasstext
-> Seq Scan on assetarea (cost=0.00..16483.60 rows=182560 width=463) (actual time=0.025..78.140 rows=182560 loops=1)
Planning time: 0.045 ms
Execution time: 439.510 ms
2.3 Statistics
Table count | Planning time ms | Execution time ms |
---|---|---|
2 | 0.081 | 1092.224 |
1 | 0.045 | 439.510 |
- We can see that reporting is faster if we keep all the areas in the same table.
3. Summary
Things considered.
- How many apps and other software are using the database at the same time?
- Data integrity but still fast queries.
Conclusion: the number of different connections updating, inserting and deleting rows in the db is so large that operative design has to be emphasized on the cost of reporting.
3.1 Inserting efficiency
The more indexes the slower the insert
3.1.1 Inserting one to 30 rows at a time
Done regularly and has to be considered. Index has no or very little effect on efficiency.
3.1.2 Inserting large amounts of data
Done once a day/week/month/year. These ETL -transfers are done at night time, scheduled and automated. More efficient if index is temporarily dropped during the data transfer. Can't take benefit from adding an entry to an index.
3.2 Deleting efficiency
3.2.1 Deleting one to 30 rows at a time
Done regularly. Index has no or very little effect on efficiency.
3.2.2 Deleting large amounts of data
Done once a day/week/month/year as part of the ETL. Very slow if DELETE is used. Fast if TRUNCATE is used. This is because in DELETE each row is recorded and removed physically whereas in TRUNCATE information rows still actually exist in the information pages but extensions are marked for reuse. Using TRUNCATE means that before truncating all restrictions need to be removed and then reapplied.
3.3 Updating efficiency
3.3.1 Updating one to 30 rows at a time
Done regularly. Index has no or very little effect on efficiency.
3.3.2 Updating large amounts of data
3.4 Query efficiency
A query for an application targets a subset like certain equipment group that is then viewed, modified etc.
Large data sets are more costly than small. Adding an index and keeping the data in as few tables as possible make the queries faster.
A big emphasis must be put on how the queries are built to see that they are efficient and fast.
3.4 Statistics
Type | Tested | Slow | Fast |
---|---|---|---|
Query | Data | large set | small set |
Query | Index | no index | has index |
Query | Joins | used joins | no joins |
Insert | Data | large set | small set |
Insert | Index | if used | drop index |
Delete | Data | large set | small set |
Delete | Index | no effect | no effect |
Update | Data | large set | small set |
Update | Index | no effect | no effect |