Database monitoring - bcgov/SIMS GitHub Wiki

Database monitoring and troubleshooting

Note: Stats based on pg_stat_statements must be executed on all HA nodes. For other stats(e.g. pgstattuple, pg_stat_user_tables) it must be executed from the master node/pod.

Monitoring queries

Identifying most expensive queries

-- This sql must be executed in all the HA nodes to get true stats.
SELECT
  round(total_exec_time::numeric, 2)  AS total_exec_ms,
  round(total_plan_time::numeric, 2)  AS total_plan_ms,
  round(mean_exec_time::numeric, 2)  AS mean_plan_ms,
  round(mean_plan_time::numeric, 2)  AS mean_exec_ms,
  queryid,
  query
FROM pg_stat_statements
WHERE stats_since >= now() - interval '30 days' 
ORDER BY total_exec_time DESC, total_plan_time DESC, calls DESC
LIMIT 10;

Monitoring tables

-- This sql must be executed in master pod to get true stats.
SELECT
  t.schemaname AS schema_name,
  t.relname AS table_name,
  pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
  t.n_dead_tup AS stat_dead_estimate,
  pst.dead_tuple_count AS actual_dead_count,
  pst.dead_tuple_percent,
  pst.free_percent,
  t.last_autovacuum,
  t.last_autoanalyze
FROM pg_stat_user_tables t
CROSS JOIN LATERAL pgstattuple(t.relid) AS pst
ORDER BY pst.dead_tuple_count DESC, pst.dead_tuple_percent DESC, pst.free_percent DESC
LIMIT 20;

Fixing bloated tables

Remediation for bloated tables can be done in 3 possible ways

  • Tuning the auto-vacuum settings so that the tables get auto-vacuumed regularly.(Recommended regular process) The column last_autovacuum from above sql indicates when the table was auto-vacuumed recently.
  • Running the vacuum manually(lightweight and non-disruptive process) when there is an immediate need to remediate a bloated table.(Only when needed).
VACUUM table_name;
-- Analyze alongside vacuum(recommended)
VACUUM ANALYZE table_name;
  • Running a full vacuum(It is a disruptive process which will lock the entire table during the process) if a table-space needs to be reduced after a vacuum.(This situation is not applicable to SIMS as we do not purge tables and expecting the table not to grow again).

Tuning the auto-vacuum settings

Listed below are the global auto-vacuum settings for all tables. This can be retrieved by

SELECT 
    name, 
    setting, 
    unit, 
    short_desc 
FROM pg_settings 
WHERE name LIKE '%autovacuum%';

If a bloated table has never been auto-vacuumed or hasn't been auto-vacuumed for a long time, the auto vacuum setting can be tuned by executing the following sql.

Example

-- This sql must be executed in master pod with superuser login.
ALTER TABLE table_name  SET ($autovacuum_setting_name = $value)
ALTER TABLE sims.education_programs_offerings_history  SET (autovacuum_vacuum_insert_scale_factor = 0.05); -- Updates autovacuum_vacuum_insert_scale_factor from 0.2 to 0.05.

Here is the current auto-vacuum tunning for selected tables Current auto vacuum tuning values

Monitoring indexes

All the indexes in SIMS database are btree indexes except for sims.user_first_name_last_name which is a gin index. The strategy to find bloated indexes is different for btree and gin indexes.

Identifying bloated Btree indexes(Requires extension pgstattuple)

-- This sql must be executed in master pod to get true stats.
select
	i.schemaname,
	i.relname as table_name,
	i.indexrelname as index_name,
	pg_size_pretty(pg_relation_size(i.indexrelid)) as index_size_pretty,
	stat.avg_leaf_density,
	stat.leaf_fragmentation,
	stat.empty_pages,
	stat.deleted_pages,
	case
		when stat.leaf_fragmentation > 50 then 'REINDEX NOW'
		when stat.leaf_fragmentation > 30 then 'REINDEX SOON'
		when stat.avg_leaf_density < 50 then 'BLOATED'
		else 'HEALTHY' end 
	as index_bloat_status
	from
		pg_stat_user_indexes i
	join 
    pg_class c on
		i.indexrelid = c.oid
	join 
    pg_am am on
		c.relam = am.oid
	cross join lateral
    pgstatindex(i.indexrelid) stat
	where
		am.amname = 'btree'
		and pg_relation_size(i.indexrelid) > 1024 * 1024
		-- Filter only indexes > 1MB
	order by
		stat.leaf_fragmentation desc;

Fixing bloated Btree indexes

REINDEX INDEX CONCURRENTLY index_name; 

--Optional: If needed to re-index all the indexes in a table
REINDEX TABLE CONCURRENTLY table_name;

Identifying bloated Gin indexes(Requires extension pgstattuple)

As we have only one GIN index sims.user_first_name_last_name, the steps to identify are very brief here.

The pgstattuple extension provides pgstatginindex(), but it only measures unflushed entries, not the actual empty space left behind by dead tuples. Because of this, GIN indexes suffer from two distinct types of bloat, and you have to check for them differently: Pending List Bloat and Structural Bloat.

Identifying Gin indexes with structural bloat

-- This sql must be executed in master pod to get true stats.
-- If the index_to_table_size_percentage is more than 100% then it can be considered as structural bloat.
-- Even after re-index if the index size still remains over 100% then it is the true size of the index without bloating.
SELECT
    i.schemaname,
    i.relname AS table_name,
    i.indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(i.relid)) AS table_size,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    ROUND(
        CAST(pg_relation_size(i.indexrelid) AS NUMERIC) /
        NULLIF(pg_relation_size(i.relid), 0) * 100,
        2
    ) AS index_to_table_size_percentage
FROM
    pg_stat_user_indexes i
JOIN 
    pg_class c ON i.indexrelid = c.oid
JOIN 
    pg_am am ON c.relam = am.oid
WHERE
    am.amname = 'gin'
    AND pg_relation_size(i.indexrelid) > 1024 * 1024 -- Only check indexes > 10MB
ORDER BY
    index_to_table_size_percentage DESC;

Fixing Gin indexes with structural bloat

REINDEX INDEX CONCURRENTLY index_name; 

--Optional: If needed to re-index all the indexes in a table
REINDEX TABLE CONCURRENTLY table_name;

Identifying Gin indexes with pending list bloat

--Check for pending pages or pending tuples
select * from pgstatginindex('sims.user_first_name_last_name');

Fixing Gin indexes with pending pages/tuples

Note: Fix for the pending list must be executed only when there is no structural bloat. If structural bloat is present, then the reindex process will also clean up the pending list.

-- Option 1: VACUUM to flush pending list
VACUUM your_table;

-- Option 2: manual flush of pending list only
SELECT gin_clean_pending_list('sims.user_first_name_last_name');

Current auto vacuum tuning values

History tables

ALTER TABLE sims.users_history  SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.education_programs_offerings_history   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.institution_locations_history   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.institutions_history   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.students_history   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.supporting_users_history   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.education_programs_history   SET (autovacuum_analyze_scale_factor = 0.01);


---

ALTER TABLE sims.users_history  SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.education_programs_offerings_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.institution_locations_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.institutions_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.students_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.supporting_users_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.education_programs_history   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

Application submission related tables

ALTER TABLE sims.applications  SET (autovacuum_analyze_scale_factor = 0.0);

ALTER TABLE sims.student_assessments   SET (autovacuum_analyze_scale_factor = 0.0);

ALTER TABLE sims.disbursement_schedules   SET (autovacuum_analyze_scale_factor = 0.0);

ALTER TABLE sims.disbursement_values   SET (autovacuum_analyze_scale_factor = 0.0);

ALTER TABLE sims.cra_income_verifications   SET (autovacuum_analyze_scale_factor = 0.0);


---

ALTER TABLE sims.applications  SET (autovacuum_vacuum_scale_factor = 0.01);

ALTER TABLE sims.student_assessments   SET (autovacuum_vacuum_scale_factor = 0.01);

ALTER TABLE sims.disbursement_schedules   SET (autovacuum_vacuum_scale_factor = 0.01);

ALTER TABLE sims.disbursement_values   SET (autovacuum_vacuum_scale_factor = 0.01);

ALTER TABLE sims.cra_income_verifications   SET (autovacuum_vacuum_scale_factor = 0.01);

Legacy tables

ALTER TABLE sims.sfas_applications  SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.sfas_individuals   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.sfas_application_disbursements   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.sfas_restrictions   SET (autovacuum_analyze_scale_factor = 0.01);

ALTER TABLE sims.sfas_application_dependants   SET (autovacuum_analyze_scale_factor = 0.01);


---

ALTER TABLE sims.sfas_applications SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.sfas_individuals   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.sfas_application_disbursements   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.sfas_restrictions   SET (autovacuum_vacuum_insert_scale_factor = 0.01);

ALTER TABLE sims.sfas_application_dependants   SET (autovacuum_vacuum_insert_scale_factor = 0.01);