Operation examples for shared wal archives environment - t-matsuo/resource-agents GitHub Wiki

(in the writing)

I'm not recommend sharing wal-archives because it's difficult to keep consisntency of wal-archives between pm01 and pm02. Inconsistent wal-archives may break data.

I hope someone will make great script to be set in archive_command.

Shared WAL archives environment

Start pacemaker

  • start pacemaker on pm01 and pm02
    • pm01 and pm02 become Slave
      • If /var/lib/pgsql/tmp/PGSQL.lock is exist, it means that data is inconsistent. So RA don't start PostgreSQL.
    • RA begins to compare xlog-location between pm01 and pm02
      • assume pm01's data is newer than pm02's
    • wait to be Master on pm01
    • on pm01, pgsql-status becomes "PRI" and pgsql-data-status becomes "LATEST"
    • on pm02, pgsql-status becomes "HS:sync" and pgsql-data-status becomes "STREAMING|SYNC"
      • If pgsql-status keeps "HS:alone" and pgsql-data-status keeps "DISCONNECTED" on pm02, it means that PostgreSQL cannot be replication mode. Please see PostgreSQL logs.

(display example)
# crm_mon -Af
============
Last updated: Wed Feb 15 15:15:15 2012
Stack: Heartbeat
Current DC: pm02 (22222222-2222-2222-2222-222222222222) - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, unknown expected votes
3 Resources configured.
============

Online: [ pm01 pm02 ]

 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2):       Started pm01
     vip-rep    (ocf::heartbeat:IPaddr2):       Started pm01
 Master/Slave Set: msPostgresql
     Masters: [ pm01 ]
     Slaves: [ pm02 ]
 Clone Set: clnPingCheck
     Started: [ pm01 pm02 ]

Node Attributes:
* Node pm01:
    + default_ping_set                  : 100
    + master-postgresql:0               : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 0000000059000078
    + pgsql-status                      : PRI
* Node pm02:
    + default_ping_set                  : 100
    + master-postgresql:1               : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync

Migration summary:
* Node pm02:
* Node pm01:

recovery pm01 after failover

  • sync data and put pm01's postgresql.conf
(on pm01)
pm01# psql -h 192.168.3.2 -U postgres -c "SELECT pg_start_backup('label',true)"
pm01# rsync -avr --delete 192.168.3.2:/var/lib/pgsql/9.1/data/ /var/lib/pgsql/9.1/data/
pm01# psql -h 192.168.3.2 -U postgres -c "SELECT pg_stop_backup()"
pm01# su - postgres
pm01$ cp /path_to_backup/postgresql.conf /var/lib/pgsql/9.1/data/postgresql.conf
  • delete PGSQL.lock if exist on pm01
    pm01# rm /var/lib/pgsql/tmp/PGSQL.lock
  • cleanup failcount
pm01# crm resource cleanup msPostgresql cent01
  • wait to be Slave
    • pgsql-status becomes "HS:sync" and pgsql-data-status becomes "STREAMING|SYNC"

(display example)
# crm_mon -Af
============
Last updated: Wed Feb 15 15:15:15 2012
Stack: Heartbeat
Current DC: pm02 (22222222-2222-2222-2222-222222222222) - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, unknown expected votes
3 Resources configured.
============

Online: [ pm01 pm02 ]

 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2):       Started pm02
     vip-rep    (ocf::heartbeat:IPaddr2):       Started pm02
 Master/Slave Set: msPostgresql
     Masters: [ pm02 ]
     Slaves: [ pm01 ]
 Clone Set: clnPingCheck
     Started: [ pm01 pm02 ]

Node Attributes:
* Node pm01:
    + default_ping_set                  : 100
    + master-postgresql:0               : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync
* Node pm02:
    + default_ping_set                  : 100
    + master-postgresql:1               : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 0000000058000230
    + pgsql-status                      : PRI

Migration summary:
* Node pm02:
* Node pm01:
⚠️ **GitHub.com Fallback** ⚠️