postgres logging - ghdrako/doc_snipets GitHub Wiki
PostgreSQL allows you to send logs to an external component named the syslog
. The idea is that there could be, in your own infrastructure, a component or a machine that is responsible for collecting logs from all the available services, including databases, web servers, application servers, and so on. T
syslog
can lost log entry. Postgres provide own solutionlogging collector
to store logs and its prefered.
show logging_collector;
# Enable logging collector
ALTER SYSTEM SET logging_collector = 'on';
# or in postgresql.conf file
SHOW log_rotation_size;
ALTER SYSTEM SET log_rotation_size = '100 MB';
SHOW log_rotation_age;
ALTER SYSTEM SET log_rotation_age = '1d';
SELECT pg_reload_conf(); -- if change postgresql.conf
log_destination
can assume one or more of the following values:
-
stderr
means the cluster logs will be sent to the standard error of the postmaster process, which commonly means they will appear on the console from which the cluster has been started. -
syslog
means that the logs will be sent to an external syslog component. -
csvlog
means that the logs will be produced as comma-separated values, useful for the automatic analysis of logs (more on this later). -
jsonlog
means that the logs will be produced as JSON tuples, another format very useful for the automatic analysis of logs (more on this later). -
eventlog
is a particular component available only on Microsoft Windows platforms that collects the logs of a whole bunch of services.
It is possible to set up the logging destination with multiple values, so that different destinations and types of logs will be produced.
log_collector
, which is a boolean value that fires on a process (named the logging collector) that captures all the logs sent to the standard error and stores them where you want. In short, setting log_destination = stderr
will force PostgreSQL to send all log messages to the standard error, which is the console from which the service has been launched. Usually, there is no attached console since the daemon is launched in the background, and moreover, not many people want to keep a console open just to see the log messages scrolling on the screen. For this reason, logging_collector = on enables the PostgreSQL logging capture process, which aims to read all the messages produced on the standard error and send them to an appropriate destination. Usually, the destination would be a text fi le, a Comma-Separated Values (CSV) fi le, or something else. Therefore, log_destination decides where PostgreSQL will emit the log messages, while logging_collector fi res a dedicat-ed process to capture those emitted log and send them elsewhere. It is important to note that a few logging destinations also require the logging collector to be turned on: cvslog and jsonlog require the logging_collector to be enabled.
To summarize, the two preceding parameters are somewhat inter-dependent: you need to choose where to send the logs that PostgreSQL will always produce (log_destination), and in the case that you send them only (or also) to the standard error or to a custom format (like csvlog), you need to turn on a dedicated process (the logging_collector value) to catch any log entries and store them on disk. This means that your logging configuration will always be something like the following:
log_destination = 'stderr'
logging_collector = on
Here, the first line tells the cluster to send the produced logs to the standard error, but from there they are to be managed and stored by a dedicated process named the logging collector.
The logging collector can be configured to place logs in the directory you desire, to name the log fi les as you wish, and to automatically rotate them. Log rotation is a quite common feature in every logging system and means that once a single log fi le has grown to a specif i ed size, or when enough time has passed, the fi le log is closed and a new one (with a different name) is created. For ex-ample, you can decide to automatically rotate your log fi les once a single fi le becomes 100 MB or every 2 days: the fi rst condition that happens triggers the rotation so that PostgreSQL produces a different log fi le at least every 2 days or every 100 MB of textual information.
- Which privileges can be executed? (Auditing access)
- Which SQL statements were executed? (Auditing SQL)
- Which tables were accessed? (Auditing table access)
- Which data rows were changed? (Auditing data changes)
- Which data rows were viewed? (usually too much data)
Reviewing which users have access to which information is important. There are a few ways of doing this:
- Write scripts that access the database catalog tables. Access control list (ACL) information is not held in one place, so you have lots of places to look at:
select relname, attname
from pg_attribute join pg_class c on attrelid = c.oid
where attname like '%acl%' and relkind = 'r';
- Write scripts that test access conforms to a specific definition. This can be achieved by writing tests using the database information functions provided by PostgreSQL — for example, has_table_privilege(), has_column_privilege(), and so on
- Using the PostgreSQL log_statement parameter—a fairly crude approach
- Using the pgaudit extension's pgaudit.log parameter
- Using EDB Postgres' audit facility
The
log_statement
parameter can be set to one of the following options: • ALL: Logs all SQL statements executed at the top level • MOD: Logs all SQL statements for INSERT, UPDATE, DELETE, and TRUNCATE • ddl: Logs all SQL statements for DDL commands • NONE: No statements logged
It is still possible to perform SQL without it being logged by this setting if you use any of the Python Languages (PLs), either through DO statements or by calling a function that includes SQL statements.
The pgaudit extension provides two levels of audit logging: session and object levels. The session level has been designed to solve some of the problems of log_statement. pgaudit will log all access, even if it is not executed as a top-level statement, and it will log all dynamic SQL. pgaudit.log can be set to include zero or more of the following settings:
- READ: SELECT and COPY
- WRITE: INSERT, UPDATE, DELETE, TRUNCATE, and COPY
- FUNCTION: Function calls and DO blocks
- ROLE: GRANT, REVOKE, CREATE/ALTER/DROP ROLE
- DDL: All DDL not already included in the ROLE category
- MISC: Miscellaneous—DISCARD, FETCH, CHECKPOINT, VACUUM, and so on
For example, to log all DDL commands, edit your postgresql.conf file to set the following:
pgaudit.log = 'role, ddl'
You should set these parameters to reduce the overhead of logging:
pgaudit.log_catalog = off
pgaudit.log_relation = off
pgaudit.log_statement_once = on
Auditing table access - Auditing by role The idea is that you define a database role, and grant permissions related to the action you want to audit to the role. Once the role and its permissions are set, you inform PgAudit to audit by that role, which means PgAudit will report in the logs any action that matches the one granted to the auditing role without any regard to the role that has performed it.
The first step is therefore the creation of a role that is used only to specify which actions to audit, and therefore will not be used as an ordinary role for interactive sessions:
CREATE ROLE auditor WITH NOLOGIN;
In order to specify which actions the role must audit, we simply have to GRANT those to the role. For example, assuming we want to audit all DELETE on every table and INSERT only on posts and categories, we have to grant the role with the following set of permissions:
GRANT DELETE ON ALL TABLES IN SCHEMA public TO auditor;
GRANT INSERT ON posts TO auditor;
GRANT INSERT ON categories TO auditor;
Everything is now prepared for PgAudit to do its job, but it is fundamental that the auditing system knows that the auditor role has to be used, therefore we need either to configure pgaudit.role in the cluster configuration or in the current session. The former method is, of course, the right one to use with a production environment, while setting the configuration parameter in a single session is useful for testing purposes. Let's set the parameter in the session, as a database administrator, to test it in action:
SET pgaudit.role TO auditor;
pgaudit can log access to each table.
To make it easier to access the audit log per table, adjust these settings:
pgaudit.log_relation = on
pgaudit.log_statement_once = off
If you want even finer-grained auditing, pgaudit allows you to control which tables are audited. The user cannot tell which tables are logged and which are not, so it is possible for investigators to quietly enhance the level of logging once they are alerted to a suspect or a potential attack. First, set the role that will be used by the auditor:
pgaudit.role = 'investigator'
Then, you can define logging through the privilege system, as in the following command:
GRANT INSERT, UPDATE, DELETE on <vulnerable_table> TO investigator;
Remove it again when no longer required. Privileges may be set at the individual column level to protect personally identifiable information (PII).
Auditing by session
SET pgaudit.log TO 'write, ddl';
If you’re running your own Postgres installation, configure the logging settings in the postgresql.conf file or by using ALTER SYSTEM.
To can change the log level of PostgreSQL using the log_min_error_statement
parameter in the PostgreSQL configuration file, selecting any of the following levels:
- DEBUG1, DEBUG2, DEBUG3… DEBUG5: Gives developers more detailed information
- INFO: Retrieves specific data requested by a user like verbose output
- NOTICE: Offers useful information to users like identifier truncation
- WARNING: Delivers warnings of likely problems
- ERROR: Registers errors, including those that cause any command to abort
- LOG: Logs data like checkpoint activity, which can be useful for the administrator
- FATAL: Occurs for errors that caused the current running session to abort
- PANIC: Occurs for errors that cause all database sessions to abort
log_min_messages = WARNING
The log_min_messages setting is the volume control for Postgres’s system messages. The default means you’ll see logs with severity WARNING or higher (ERROR, LOG, FATAL, PANIC).
ERROR: out of memory
LOG: parameter "log_lock_waits" changed to "ON"
FATAL: terminating connection due to conflict with recovery
log_min_error_statement = ERROR
log_min_error_statement is the SQL statement version of log_min_messages. That means it controls the logging level for statements associated with errors. In the example below, the first log line is emitted depending on log_min_messages, and the paired second log line is emitted because of log_min_error_statement.
ERROR: current transaction is aborted, commands ignored until end of transaction block
STATEMENT: SELECT * FROM mytable WHERE id = 1 FOR UPDATE;
Because the logs produced by these two parameters (log_min_messages and log_min_error_statement) are paired in this way, you may prefer to match their settings. For example, setting log_min_error_statement to WARNING as well. That way every system log that has a corresponding SQL statement will be logged with the statement.
log_error_verbosity = DEFAULT
In Postgres, log_error_verbosity
has three possible settings. DEFAULT
is a good middle ground between TERSE
and VERBOSE
. DEFAULT
adds detail, hint, query, and context information to the log if they are available for that error. For example,
ERROR: cannot drop table customers because other objects depend on it
DETAIL: constraint orders_fkey on table orders depends on table customers
HINT: Use DROP ... CASCADE to drop the dependent objects too.
For general use, stick with DEFAULT. TERSE doesn’t provide guidance on what to do about an error. Turn to VERBOSE only if you plan to actually look up the Postgres source code.
log_connections = ON
An important feature of log_connections
is that it will log a connection attempt as one line and make a separate log line for successful connections. This is handy for identifying bad connection attempts and where they originate: log_connections
will show you the IP address of the connection attempt.
LOG: connection received: host=122.220.2.133 port=5432 pid=655912
LOG: connection authorized: user=bob database=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)
LOG: connection received: host=122.220.2.133 port=5432 pid=655920
FATAL: password authentication failed for user "bob"
DETAIL: Password does not match for user "bob".
Connection matched pg_hba.conf line 7: "host all all 122.220.2.133/32 password"
LOG: connection received: host=24.2.2.134 port=5432 pid=655908
FATAL: no pg_hba.conf entry for host "24.2.2.134", user "alice", database "postgres", SSL on
log_checkpoints = ON
Checkpoints are an important part of a Postgres server’s lifecycle. They represent the moment when the Write Ahead Log (WAL) is synchronized to the Postgres data directory. Checkpoints are I/O intensive events, but recent checkpoints also help with server restart times. log_checkpoints gives you insight into when checkpoints start, how long they take, and some statistics like buffers written.
LOG: checkpoint starting: time
LOG: checkpoint complete (108): wrote 4122 buffers (1.6%); 1 WAL file(s) added, 0 removed, 0 recycled; write=269.657 s, sync=0.000 s, total=269.688 s; sync files=48, longest=0.002 s, average=0.000 s; distance=126856 kB, estimate=126856 kB
log_disconnections = ON
(Default is OFF). As you might guess, log_disconnections is the counterpart to log_connections. If you need to know when connections end and how long each connection lasted, consider turning this ON.
LOG: disconnection: session time: 0:05:42.621 user=alice database=postgres host=122.220.2.133 port=5432
log_lock_waits = ON
(Default is OFF). If you suspect that lock waits are impacting the performance of your application on Postgres, or you’ve had trouble with locks before and want to keep an eye on them, log_lock_waits is the parameter for you.
A log is generated if a session waited longer than deadlock_timeout for a lock. The two parameters (deadlock_timeout and log_lock_waits) work in tandem. To have effective lock logging, don’t set deadlock_timeout too low—your logs will start getting verbose and the important waits won’t stand out. The default for deadlock_timeout is 1s.
LOG: process 667740 still waiting for ShareLock on transaction 599 after 1000.013 ms
DETAIL: Process holding the lock: 658912. Wait queue: 667740.
CONTEXT: while locking tuple (0,2) in relation "mytable"
STATEMENT: SELECT * FROM mytable WHERE id = 2 FOR UPDATE;
LOG: process 667740 acquired ShareLock on transaction 599 after 30671.632 ms
Note that actual deadlocks are logged independently of the log_lock_waits parameter, because they are classified as a database error. You’ll see them if log_min_messages is set to at least ERROR.
ERROR: deadlock detected
DETAIL: Process 658912 waits for ShareLock on transaction 597; blocked by process 660260.
Process 660260 waits for ShareLock on transaction 596; blocked by process 658912.
Process 658912: SELECT * FROM mytable WHERE id = 2 FOR UPDATE;
Process 660260: SELECT * FROM mytable WHERE id = 1 FOR UPDATE;
HINT: See server log for query details.
CONTEXT: while locking tuple (0,2) in relation "mytable"
STATEMENT: SELECT * FROM mytable WHERE id = 2 FOR UPDATE;
The quality of the information in the log is also established by the log_line_prefix
parameter. log_line_prefix is a pattern string that defines what to insert at the beginning of every logline, and therefore can be used to detail the event that is logged.
Useful and common placeholders are as follows:
-
%a
represents the application name (for example, psql). -
%u
represents the username connected to the cluster (role name). -
%d
is the database where the event happened. -
%p
is the operating system process identifier (PID). -
%h
represents the remote host from which the connection to the cluster has been established. -
%l
is the session line number, an autoincrement counter that helps us to understand the ordering of every statement executed in an interactive session. -
%t
is the timestamp at which the event happened.
log_line_prefix = %m user=%u db=%d pid=%p:
log_line_prefix = '%t [%p]: [%l] user=%u,db=%d,app=%a,client=%h '
(Postgres default is %m [%p] , time and process ID; Azure Postgres default is %t-%c-, time and session ID)
This is one of my favorite logging parameters because it has a positive impact on all your Postgres logs. log_line_prefix tells Postgres to add some standard information to the beginning of each log line. The suggested value of %m user=%u db=%d pid=%p: would provide the timestamp, user name, database name, and process id for each log. That way when you see a log you can quickly narrow down on who (user) and where (database). Logging the process or session ID allows you to correlate associated events.
2020-03-05 23:43:07.793 UTC user=bob db=postgres pid=670232: LOG: parameter "work_mem" changed to "8192"
2020-04-05 02:05:07.423 UTC user=alice db=postgres pid=879234: ERROR: permission denied for table customers
In general, be very careful of statement log settings. Most of them are Booleans (everything or nothing). They are all off by default.
Controls which SQL statements are logged. Valid values are none (off), ddl, mod, and all
(all statements).
-
ddl
logs all data definition statements, such as CREATE, ALTER, and DROP statements. -
mod
logs all ddl statements, plus data-modifying statements such as INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE, EXECUTE, and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type. For clients using extended query protocol, logging occurs when an Execute message is received, and values of the Bind parameters are included (with any embedded single-quote marks doubled).
The default is none.
Causes the duration of every completed statement to be logged. The default is off.
Note
The difference between enabling log_duration and setting log_min_duration_statement to zero is that exceeding log_min_duration_statement forces the text of the query to be logged, but this option doesn't. Thus, if log_duration is on and log_min_duration_statement has a positive value, all durations are logged but the query text is included only for statements exceeding the threshold. This behavior can be useful for gathering statistics in high-load installations.
If you were interested in log_statement = ALL
for auditing purposes, consider using pgAudit instead. pgAudit
provides rich controls for configuring audit logging. You can learn more about pgAudit in this blog post.
PostgreSQL Auditing Extension more commonly known by its shorter name: pgAudit
CREATE EXTENSION pgaudit; # execute on every database needed pgaudit logging
The CLASS categorizes the kind of statement (READ, WRITE, DDL etc) and COMMAND tells you what sub-class it is. The OBJECT_TYPE and OBJECT_NAME identify whether a table, view etc was accessed and which one it was. The STATEMENT field is a key factor that sets pgAudit apart from using Postgres statement logging only. You may have been asking, why can't I just use log_statement=all as my audit logging? It's because pgAudit shows you not only the query that was sent by the client, but the actual statement executed on the back end.
Format
AUDIT: SESSION,1,1,<CLASS>,<COMMAND>,,<STATEMENT>
AUDIT: OBJECT,1,1,<CLASS>,<COMMAND>,<OBJECT TYPE>,<OBJECT NAME>
- AUDIT_TYPE - SESSION or OBJECT.
- STATEMENT_ID - Unique statement ID for this session. Each statement ID represents a backend call. Statement IDs are sequential even if some statements are not logged. There may be multiple entries for a statement ID when more than one relation is logged.
- SUBSTATEMENT_ID - Sequential ID for each sub-statement within the main statement. For example, calling a function from a query. Sub-statement IDs are continuous even if some sub-statements are not logged. There may be multiple entries for a sub-statement ID when more than one relation is logged.
- CLASS - e.g. READ, ROLE (see pgaudit.log).
- COMMAND - e.g. ALTER TABLE, SELECT.
- OBJECT_TYPE - TABLE, INDEX, VIEW, etc. Available for SELECT, DML and most DDL statements.
- OBJECT_NAME - The fully-qualified object name (e.g. public.account). Available for SELECT, DML and most DDL statements.
- STATEMENT - Statement executed on the backend.
- PARAMETER - If pgaudit.log_parameter is set then this field will contain the statement parameters as quoted CSV or if there are no parameters. Otherwise, the field is .
Example
Log Output:
AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.account,select password
from account,<not logged>
AUDIT: OBJECT,2,1,WRITE,UPDATE,TABLE,public.account,update account
set password = 'HASH2',<not logged>
AUDIT: OBJECT,3,1,READ,SELECT,TABLE,public.account,select account.password,
account_role_map.role_id
from account
inner join account_role_map
on account.id = account_role_map.account_id,<not logged>
AUDIT: OBJECT,3,1,READ,SELECT,TABLE,public.account_role_map,select account.password,
account_role_map.role_id
from account
inner join account_role_map
on account.id = account_role_map.account_id,<not logged
pgAudit alone does not give us the who or the when. pgAudit relies on Postgres's log_line_prefix parameter for this information. Log_line_prefix sets the initial text of each log. Postgres's default prefix is '%m [%p] ' which logs time stamp and process ID. For auditing, it is helpful to enhance this prefix. For example, the setting t=%m u=%u db=%d pid=[%p]:
gives you a log like:
t=2019-10-01 01:21:41.377 UTC u=alice db=postgres pid=[96]: LOG: AUDIT: SESSION,1,1,READ,SELECT,,,SELECT 1,<not logged>
pgAudit offers some settings that can help control the volume of logs that are generated. There are two types of logging and you can enable one or both: session-level
and object-level
auditing.
- Session auditing logs all statements that are executed by users. You can use the
pgaudit.log
parameter to scope this down to certain classes of statements:READ, WRITE, FUNCTION, ROLE, DDL, MISC
. Select all the classes or the subset that is most pertinent for you. - object auditing records statements for specific relations. For example, you can have an audit trail for a table that stores credit card information, but not have auditing for a table that stores non-personally identifiable data. (Of course, account for your compliance requirements when deciding). Using this finer grained audit option requires some additional setup. You need to register a Postgres role that has permissions on the relation in question. As new relations are created, you need to be sure you are including the relevant ones in the auditor role's permissions. Also, object auditing logs only
selects, inserts, updates, and deletes
.
Log can divide them into two types: admin-specific logs and application-user-specific logs.
Admin-specific logs help manage the PostgreSQL server. If the server is not working properly, these can provide the reason for this and aid in troubleshooting.
There are two types of admin-specific logs:
- Startup logs: These show all the important events and any issues (for example, due to any misconfigurations) during the startup process of your PostgreSQL server.
- Server logs: These can help you identify anything going wrong with the PostgreSQL server at runtime from an admin perspective. They are located in the default location of your installation or as prescribed by you in the PostgreSQL configuration file.
When it comes to application-user-specific logs, there are several important PostgreSQL logs to keep an eye on:
-
Query logs show you all the queries that have occurred in the server; you can see the logged queries if you have
log_statement
enabled. -
Transaction logs are the record of all events performed on the database; they follow the WAL (write ahead log) standard, which is not meant to be human readable. WAL is a way to keep a record of all actions performed on the database and can be used to recover from a catastrophic failure. The
pg_receivexlog plugin
can show the transaction logs streamed by your PostgreSQL server. -
Connection logs are useful to find any unwanted connections to the server. You can enable
log_connections
in thepostgresql.conf
file to log each attempt to connect to your server;log_disconnections
lets you see all the clients that disconnected from the server. -
Error logs help you identify if any of your queries create unwanted issues in the server;
log_min_error
_statement controls the error statement logging severity level. -
Audit logs and access logs are critical from the admin’s point of view. The former show changes made to the database, while the latter identify who made what queries; these can be enabled via the
log_statement
configuration or a PostgreSQL plugin likepgAudit
.
The log_min_duration_statement configuration parameter allows Postgres to do some of the work in finding slow queries. You decide the threshold, and the server logs the SQL statements that take at least that much time to run.
If you set log_min_duration_statement = 0
, then all queries would be logged, which will typically swamp the log file, causing more performance problems itself, and thus this is not recommended.
Usually, it is much more useful to configure the log_min_duration_statement setting to log only "slow" statements, instead of logging them all by log_statement=all.
to set a limit for sampling queries. The two settings are designed to work together:
- Any query elapsed time less than log_min_duration_sample is not logged at all.
- Any query elapsed time higher than log_min_duration_statement is always logged.
- For any query elapsed time that falls between the two settings, we sample the queries and log them at a rate set by log_statement_sample_rate (default 1.0 = all). Note that the sampling is blind—it is not stratified/weighted, so rare queries may not show up at all in the log.Query logging will show the parameters that are being used for the slow query, even when pg_stat_statements does not.
ALTER DATABASE us SET log_min_duration_statement = '100ms';
There are two parameters that can be used to tune the log threshold:log_min_messages
and client_min_messages
.
log_min_messages
specifies what the cluster has to insert into the logs without any regard for incoming user connections, nor their settings. client_min_messages
decides which log events the client has to report to the user during the connection. Both these settings can assume a value from the preceding list of thresholds.
A typical use case of a development or test environment could be the following one:
log_min_messages = 'info'
client_min_messages = 'debug1'
With the preceding configuration, the cluster will log only info messages in the textual logs, which is something related to the normal execution of the processes, while incoming user connections will report more detailed messages such as development ones back to the user.
Setting thresholds is not the only way you can decide when to trigger log insertion: there
are another couple of settings that can be used to take care of the duration of statements
and utilities. For example, log_min_duration_statement
inserts into the logs the textual
representation of an SQL statement that has run for more than the specified amount of time
(expressed in milliseconds). log_autovacuum_min_duration
logs the autovacuum
actions that took more than the specified amount of milliseconds.
Another very important parameter to decide when to log activity is
log_transaction_sample_rate. This setting was introduced in PostgreSQL 12 and logs
every statement within an explicit transaction with a sample rate. The sample rate is a value
between 0 (no sampling) and 1 (full sampling), and therefore is a value of 0.5:
log_transaction_sample_rate = 0.5
This will log every single statement from every explicit transaction out of two. As another example, consider setting it to a very low value as follows:
log_transaction_sample_rate = 0.01
Having a low value as in the preceding will log a transaction every 100 transactions. The aim of this parameter is to get a sample of a full transaction on a heavily loaded system, so as to help to get an idea of what transaction is executing without the need to log every single statement of every single transaction. As you can imagine, the sampling is not an absolute value, but rather an approximation of the number of transactions to wait before activating the logging.
A complete example of logging configuration
sudo cat /postgres/12/logging.conf
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = '1d'
log_rotation_size = '100MB'
log_line_prefix = '%t [%p]: [%l] user=%u,db=%d,app=%a,client=%h'
log_checkpoints = on
log_connections = on
log_disconnections = on
log_min_duration_statement = '100ms'