Automated log rotation - ISISComputingGroup/ibex_developers_manual GitHub Wiki
Automated log rotation
IBEX logs are rotated by a script called logrotate.py
in C:\instrument\apps\epics\utils\logrotate.py
. This script is called nightly by a windows scheduled task.
Currently, for all files not modified within the last 10 days, it:
- Deletes them if they are console logs
- Moves them to
stage-deleted
otherwise
The scheduled task is added/recreated as an install step in ibex_utils
under server_tasks
.
A nagios check will use the same script, but in dry-run mode and with a cutoff of 14 days, to notify us if the log rotation stops working.
Automated periodic database message log truncation
Console logs are written to a msg_log
database. After a while, the number of messages can become huge, especially if the console logging frequency is high. To mitigate this, an automated message log truncation procedure has been created. A scheduled SQL event calls a procedure to truncate the database message
table to retain just those records spanning a predefined retention period.
The table's primary key is just the ID number, so searching on the createTime column for the row at the retention period threshold would be inherently slow. Instead a binary search algorithm has been developed which, knowing the earliest and last record timestamps, will check whether the target time lies above or below the current binary division and iterate until the createTime
field of the target record is within three rows of the low and high row search limits. The binary_search_time()
procedure is in the binary-search.sql
file.
The binary_search_time()
procedure takes one input parameter and returns two output parameters:
Binary Search:
IN p_retention_period time -- The retention period as a time type (e.g. '336:00:00' == 2 wks) Note that this is limited in SQL to '838:59:59', about 35 days
OUT p_first_row_id -- The id value of the earliest row in the table (zeroth row).
OUT p_row_number INT -- The returned record number in the table which is closest to the target time.
This will be a reference to a variable provided by the calling scope.
The procedure truncate_message_table()
calls the binary search procedure and uses the returned first row ID and target row ID to determine the rows to be deleted. Potentially the number of rows to be deleted could be very large, in some cases millions, so it is important that the database server is not locked for significant periods. Instead, rows are deleted in blocks of, say 1000 rows at a time, with a sleep period in between. This makes the deletion process cooperative and prevents data loss and timeouts from other database clients.
The truncate_message_table()
procedure takes one input parameter and returns two output parameters, in the same way as the binary search procedure:
Truncation procedure:
IN p_retention_period time -- The retention period as a time type (e.g. '336:00:00' == 2 wks) Note that this is limited in SQL to '838:59:59', about 35 days
OUT p_first_row_id -- The id value of the earliest row in the table (zeroth row).
OUT p_row_number INT -- The returned record number in the table which is closest to the target time.
This will be a reference to a variable provided by the calling scope.
The log_truncation_event()
is triggered periodically, typically once per day at 01:00 and calls the truncate_message_table()
, supplying the retention period in days. Within the CREATE EVENT
block, there are two STARTS lines, one of which is commented out and can be instated for testing with a short interval.
Files specific to automatic truncation are:
File | Description |
---|---|
binary_search.sql | Defines the binary_search_time() procedure |
truncate_message_table.sql | Defines the truncate_message_table() procedure |
truncate_event.sql | Defines the log_truncation_event() event |
create_event_logger.sql | Defines the EventsLog table for auto truncation process logging |
debug_log.sql | Defines the debug_log procedure appending info to the process log |
test\test_truncate_proc.sql | A SQL script to quickly test the truncate_message_table() procedure |
test\test_fill_message.sql | A SQL script to populate the message table with records assigned a createDate field value at one hour intervals over the given period (period_days) |
Testing
There is a fairly comprehensive README.md file in C:\Instrument\Apps\EPICS\ISIS\IocLogServer\master\tests
For new systems with no pre-exiting database, the described test procedure can be invoked once the msg_log database has been built by running: C:\Instrument\Apps\EPICS\SystemSetup\config_mysql.bat
.
For systems with an existing msg_log database, there is a SQL script which simply creates the SQL event and procedures necessary for the automatic truncation, without affecting any existing database content: cd C:\Instrument\Apps\EPICS\ISIS\IocLogServer\master C:\Instrument\Apps\MySQL\bin\mysql.exe -u root --password= < log-truncation-schema.sql
Dummy data:
There is a script : SQL file tests\test_fill_message.sql which will insert dummy log records into the message table. Message records will be assigned a createDate
field value at one hour intervals over the given period (period_days). These parameters are configurable within the script.
C:\Instrument\Apps\MySQL\bin\mysql.exe -u root --password=<db root password> < tests\test_fill_message.sql
Run the test:
After filling the message table, it is then possible to test the truncation procedure via:
C:\Instrument\Apps\MySQL\bin\mysql.exe -u root --password=<db root password> < tests\test_truncate_proc.sql
Metrics are output to the screen. It can also be helpful to manually examine the message table to view the remaining data and check the time span.
Note that the checks that the script is running correctly is by examining a new database table debug_messages, which is emptied before each truncation process. Details of the process are recorded in the table as simple text, which includes information on the progress of the binary search procedure.
If you want to update a procedure, simply edit the SQL file in the C:\Instrument\Apps\EPICS\ISIS\IocLogServer\master
directory or tests\
subdirectory, then use the mysql executable to do the change to the database accordingly, e.g.:
C:\Instrument\Apps\MySQL\bin\mysql.exe -u root --password=<db root password> < truncate_event.sql