Backup Routine - intelligent002/kafka-backup-offline GitHub Wiki

Step-by-step routine used by the tool to perform full cluster backups.


Key Features

  • Automatic nightly/weekly backups via cronjob
    • You can schedule automated backups using Linux crontab on the management node (typically node-00).
    • Example: To run the backup every day at 01:00 AM, add the following line using crontab -e:
      0 1 * * * /data/KBO/kafka-backup-offline.sh cluster_backup
      
    • This command will invoke the full backup process according to the default configuration.
    • Ensure that the script is executable and the user has sufficient permissions.
  • Manual backup triggers via the GUI
  • Retention policy with automatic rotation of old backups
  • Structured "zip-of-zips" archive format for multi-node environments

Storage Details

Backups are stored on node-00 by default, mounted as /backup/.

Within this drive:

  • All cold backups are stored under /backup/cold/
  • Subfolders include:
    • /backup/cold/data/
    • /backup/cold/certificates/
    • /backup/cold/credentials/
    • /backup/cold/configs/

For certificates:

  • Rotated backups are stored in /backup/cold/certificates/rotated/YYYY/MM/DD/
  • Pinned (non-rotated) backups are stored in /backup/cold/certificates/pinned/
  • Example file: 2025-04-08---01-14-02---credentials.xz containing a zip-of-zips, where each node's certificates are stored in separate archives inside the main one

Rotation & Retention

  • Rotation refers to automatic deletion of older backups beyond the retention window
  • The tool ensures only the most recent backups are retained to conserve disk space
  • Default retention policy is:
    # Number of days to retain config backups.
    retention_policy_certificates: 365
    
    # Number of days to retain credentials backups.
    retention_policy_credentials: 365
    
    # Number of days to retain config backups.
    retention_policy_configs: 365
    
    # Number of days to retain data backups.
    retention_policy_data: 7
    
  • Users can manually pin specific backups to prevent them from being deleted by moving them into the pinned/ directory

Workflow

  1. Controlled shutdown of the Kafka cluster (from the last towards the first node, one by one)
  2. Perform zip of component data on all nodes
  3. Collect zip of component data from all nodes to local drive on node-00
  4. Consolidation of zip files into a "zip of zips" on node-00
  5. Restart Kafka cluster using the controlled startup sequence (from first towards the last node, one by one)