Automating Db2 HADR with Pacemaker - fsmegale/Db2-luw GitHub Wiki
This is a Test Case I did to implement Pacemaker with Db2 HADR. So this (as all the content in my Git) is not IBM official.
Pacemaker can now be used to automate Db2 HADR takeover instead of using TSA (Tivoli System Automation).
The documentation says:
"
Important: Starting from version 11.5.6, the Pacemaker cluster manager for automated fail-over to HADR standby databases is packaged and installed with Db2. In version 11.5.5, Pacemaker is included and available for production environments. In version 11.5.4, Pacemaker is included as a technical preview, and should be restricted to development, test, and proof-of-concept environments.
"
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-configuring-clustered-environment-using-db2cm-utility
In this Article we will also see the tests for Two-Node Quorum.
To see the tests from qDevice Quorum (the recommended Quorum mechanism) check my other Article below AFTER seeing this one you are reading now:
Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR
Prerequisites for an integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using
Configuring a clustered environment using the Db2 cluster manager (db2cm) utility
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-configuring-clustered-environment-using-db2cm-utility
Integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=feature-integrated-solution-using-pacemaker
Quorum devices support on Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=component-quorum-devices-support-pacemaker_
Install and configure a QDevice quorum
https://www.ibm.com/docs/hr/db2/11.5?topic=utility-install-configure-qdevice-quorum
Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR
- To avoid the use of passwordless for the root
How to run db2cm without a root passwordless ssh setup
https://www.ibm.com/support/pages/node/6841049
- To use VIP:
"
6. Optional: Create the VIP resources for the newly created database.
./sqllib/bin/db2cm -create -primaryVIP <IP_address> -db <database_name> –instance <instance_name>
"
Creating an HADR Db2 instance on a Pacemaker-managed Linux cluster
https://www.ibm.com/docs/en/db2/11.5?topic=option-creating-pacemaker-managed-hadr-db2-instance
- 3 Virtual Machines
- All Machines with SUSE Linux v15 SP3
- 2 of those Machines with Db2 v11.5.7.0 and HADR already configured and working
Hostnames and IPs:
hostHADR1 - 192.168.145.134
hostHADR2 - 192.168.145.136
qDevicehost - 192.168.145.137
Database Name: hadrpace
Instance name: db2inst1
Database HADR ports (40001 and 40002)
First I will configure Pacemaker with Two-Node Quorum.
Next, I will add a third node to configure the qDevice Quorum.
To know more about Quorum in Pacemaker, check the following documentation:
Quorum devices support on Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=component-quorum-devices-support-pacemaker
But I can highlight what the documentation above says:
"
Based on the advantages and disadvantages shown, the QDevice quorum is the recommended quorum mechanism for Db2.
"
PS.:
As I see it, the way to configure qDevice Quorum is to first configure Two-Node Quorum and then add the third node to be que qDevice Quorum.
According to the System Requirements, the Db2 Fault Monitor must be turned off.
From both HADR machines, check:
localhost:/home/db2inst1/sqllib/bin # ps -ef | grep db2fmcd
root 50365 1 0 06:11 ? 00:01:04 /opt/ibm/db2/V11.5/bin/db2fmcd
root 128467 126831 0 14:43 pts/1 00:00:00 grep --color=auto db2fmcd
From both HADR machines, stop Fault Monitor if it is running:
localhost:/home/db2inst1/sqllib/bin # ./db2fmcu -d
localhost:/home/db2inst1/sqllib/bin # ip r
default via 192.168.145.2 dev eth0 proto dhcp
192.168.145.0/24 dev eth0 proto kernel scope link src 192.168.145.134
localhost:/home/db2inst1/sqllib/bin # ip r
default via 192.168.145.2 dev eth0 proto dhcp
192.168.145.0/24 dev eth0 proto kernel scope link src 192.168.145.136
## Máquinas do HADR
192.168.145.134 hostHADR1.com.br hostHADR1
192.168.145.136 hostHADR2.com.br hostHADR2
Changed the db2nodes.cfg to reflect the new hostname.
Output from one of the node after the change:
db2inst1@hostHADR2:~/sqllib> cat db2nodes.cfg
0 hostHADR2 0
As System Requirements, both HADR nodes must have Passwordless configured for root and db2inst1 (Db2 instance owner) users.
So, run the following command from both HADR machines with root and db2inst1 users:
ssh-keygen -t rsa
Example:
hostHADR1:~/.ssh # ssh-keygen -t rsa //just hit ENTER for all options
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:EWFudSQJf+oLD3PD0X11fC8F9sZJVpy2JJ76RzRXpS4 root@hostHADR1
The key's randomart image is:
+---[RSA 3072]----+
| =oooo o+*|
| o +.o ooX+|
| + . o *.%|
| . . + = **|
| S o E = =|
| o o . + |
| + = . . |
| * o . . |
| o . |
+----[SHA256]-----+
hostHADR1:~/.ssh # ls -ltr
total 8
-rw-r--r-- 1 root root 568 Jan 7 17:03 id_rsa.pub
-rw------- 1 root root 2602 Jan 7 17:03 id_rsa
The command above creates the //.ssh and the //.ssh/id_rsa* files that are the authentication keys.
Now, copying the keys from on node to other with the following command (need to be done from both HADR machines and for root and db2inst1 users).
ssh-copy-id <user>@<target_machine_IP>
It is required to use the Pacemaker and Corosync provided by IBM to configure the Db2 HADR automation.
From version v11.5.7.0, Pacemaker is supposed to be installed when Db2 is installed.
So I tried to run the firs step that was Creating the Cluster.
Command:
db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0
Result:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0
Line: 1198 Error running command:
crm configure property stonith-enabled=false
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
Cluster Status
Line: 501 Error running command:
crm config show
So, I manually uninstalled and installed Pacemaker again. You can first run db2prereqPCMK to check if System Requirements are attended before running the installer.
So, from the Db2 media it was run FROM BOTH HADR machines. Example from one of the machines:
- uninstalling Pacemaker:
hostHADR2:/tmp/Db2-Installers/server_dec/db2/linuxamd64/pcmk # ./db2uninstallPCMK
Removing Db2 agents
Success.
Uninstalling Pacemaker
Success.
The db2uninstallPCMK program completed successfully.
See /tmp/db2uninstallPCMK.log.30826 for more information.
- Installing Pacemaker:
hostHADR2:/tmp/Db2-Installers/server_dec/db2/linuxamd64/pcmk # ./db2installPCMK -i
Installing "Pacemaker"
Success
DBI1070I Program db2installPCMK completed successfully.
- Copying required files:
hostHADR2:/opt/ibm/db2/V11.5/ha/pcmk # cp * /usr/lib/ocf/resource.d/heartbeat/
hostHADR2:/usr/lib/ocf/resource.d/heartbeat # ls -la |grep -i db2
-rwxr-xr-x 1 root root 25331 Jul 20 2020 db2
-r-xr-xr-x 1 root root 26109 Jan 10 17:57 db2ethmon
-r-xr-xr-x 1 root root 41082 Jan 10 17:57 db2hadr
-r-xr-xr-x 1 root root 27370 Jan 10 17:57 db2inst
total 100
-r-xr-xr-x 1 root root 26109 Nov 22 15:28 db2ethmon
-r-xr-xr-x 1 root root 27370 Nov 22 15:28 db2inst
-r-xr-xr-x 1 root root 41082 Nov 22 15:28 db2hadr
Now the Cluster creation worked fine.
db2cm utiliy must be run with ROOT user.
The command must be run from only one of the HADR nodes no matter which one.
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0
Created db2_hostHADR1_eth0 resource.
Created db2_hostHADR2_eth0 resource.
Cluster created successfully.
- Checking the Cluster was created:
hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:01:20 2022
* Last change: Mon Jan 10 17:59:12 2022 by root via cibadmin on hostHADR1
* 2 nodes configured
* 2 resource instances configured
Node List:
* Online: [ hostHADR1 hostHADR2 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR1
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
For hosrHADR1 host:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -instance db2inst1 -host hostHADR1
Created db2_hostHADR1_db2inst1_0 resource.
Instance resource for db2inst1 on hostHADR1 created successfully.
For hosrHADR2 host (remember that the commands are executed from the same node as the previous command):
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -instance db2inst1 -host hostHADR2
Created db2_hostHADR2_db2inst1_0 resource.
Instance resource for db2inst1 on hostHADR2 created successfully.
Checking status again:
hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:05:07 2022
* Last change: Mon Jan 10 18:04:15 2022 by root via cibadmin on hostHADR1
* 2 nodes configured
* 4 resource instances configured
Node List:
* Online: [ hostHADR1 hostHADR2 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR1
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR1
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Stopped
The db2_hostHADR2_db2inst1_0 is showed as "Stopped".
After some moments it got "Started" status.
hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:10:20 2022
* Last change: Mon Jan 10 18:04:15 2022 by root via cibadmin on hostHADR1
* 2 nodes configured
* 4 resource instances configured
Node List:
* Online: [ hostHADR1 hostHADR2 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR1
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR1
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR2
Checking the db cfg "Cluster manager" parameter:
hostHADR1:/home/db2inst1/sqllib/bin # su - db2inst1
db2inst1@hostHADR1:~> db2 get dbm cfg | grep -i cluster
Cluster manager = PACEMAKER
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -db hadrpace -instance db2inst1
Database resource for HADRPACE created successfully.
Checking the status again:
hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:16:03 2022
* Last change: Mon Jan 10 18:15:43 2022 by root via cibadmin on hostHADR1
* 2 nodes configured
* 6 resource instances configured
Node List:
* Online: [ hostHADR1 hostHADR2 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR1
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR1
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR2
* Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
* Masters: [ hostHADR1 ]
* Slaves: [ hostHADR2 ]
Checking with db2cm -list:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
Cluster Status
Domain information:
Error: Could not get Pacemaker version. - 679
Domain name = HadrPaceDomain
Pacemaker version =
Corosync version = 3.0.4
Current domain leader = hostHADR1
Number of nodes = 2
Number of resources = 6
Node information:
Name name State
---------------- --------
hostHADR1 Online
hostHADR2 Online
Resource Information:
Resource Name = db2_db2inst1_db2inst1_HADRPACE
Resource Type = HADR
DB Name = HADRPACE
Managed = true
HADR Primary Instance = db2inst1
HADR Primary Node = hostHADR1
HADR Primary State = Online
HADR Standby Instance = db2inst1
HADR Standby Node = hostHADR2
HADR Standby State = Online
Resource Name = db2_hostHADR1_db2inst1_0
State = Online
Managed = true
Resource Type = Instance
Node = hostHADR1
Instance Name = db2inst1
Resource Name = db2_hostHADR1_eth0
State = Online
Managed = true
Resource Type = Network Interface
Node = hostHADR1
Interface Name = eth0
Resource Name = db2_hostHADR2_db2inst1_0
State = Online
Managed = true
Resource Type = Instance
Node = hostHADR2
Instance Name = db2inst1
Resource Name = db2_hostHADR2_eth0
State = Online
Managed = true
Resource Type = Network Interface
Node = hostHADR2
Interface Name = eth0
Fencing Information:
Not configured
Quorum Information:
Two-node quorum
The Two-Node Quorum is configured. =)
In this Test I will turn off virtual machines that has the Primary HADR database to see the Pacemaker taking over the cluster.
Turned off host hostHADR1 (that was the Primary database at that time):
hostHADR1:~ # shutdown now
The old standby (hostHADR2) became the Primary:
hostHADR2:~ # su - db2inst1
db2inst1@hostHADR2:~> db2pd -d hadrpace -hadr
Database Member 0 -- Database HADRPACE -- Active -- Up 0 days 02:45:28 -- Date 2022-01-10-18.45.08.919199
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = DISCONNECTED
HADR_FLAGS =
PRIMARY_MEMBER_HOST = 192.168.145.136
PRIMARY_INSTANCE = db2inst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = 192.168.145.134
STANDBY_INSTANCE = db2inst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = DISCONNECTED
HADR_CONNECT_STATUS_TIME = 01/10/2022 18:43:49.748353 (1641851029)
Checking "crm status":
hostHADR2:/usr/lib/ocf/resource.d/heartbeat # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:44:02 2022
* Last change: Mon Jan 10 18:43:58 2022 by root via crm_attribute on hostHADR2
* 2 nodes configured
* 6 resource instances configured
Node List:
* Online: [ hostHADR2 ]
* OFFLINE: [ hostHADR1 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Stopped
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Stopped
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR2
* Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
* db2_db2inst1_db2inst1_HADRPACE (ocf::heartbeat:db2hadr): Master hostHADR2 (Monitoring)
* Stopped: [ hostHADR1 ]
- Turning the Virtual Machine on again.
After the virtual machine was turned on again, the "crm status" changed the status some times until getting all resources Online/Started:
hostHADR2:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Mon Jan 10 18:46:09 2022
* Last change: Mon Jan 10 18:43:58 2022 by root via crm_attribute on hostHADR2
* 2 nodes configured
* 6 resource instances configured
Node List:
* Online: [ hostHADR2 ]
* OFFLINE: [ hostHADR1 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Stopped
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Stopped
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR2
* Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
* Masters: [ hostHADR2 ]
* Stopped: [ hostHADR1 ]
db2pd showed the HADR was Connected and Peer:
db2inst1@hostHADR2:~> db2pd -d hadrpace -hadr
Database Member 0 -- Database HADRPACE -- Active -- Up 0 days 02:58:15 -- Date 2022-01-10-18.57.55.733919
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS = TCP_PROTOCOL
PRIMARY_MEMBER_HOST = 192.168.145.136
PRIMARY_INSTANCE = db2inst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = 192.168.145.134
STANDBY_INSTANCE = db2inst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
Great. It worked :)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The qDevice host will be added to the cluster configured above.
This new host will be an arbitrator node so it is not actually part of the cluster and it can be used as arbitrator of N pacemakers clusters.
In all 3 hosts it was edited the /etc/hosts to have this:
## HADR Machines
192.168.145.134 hostHADR1.com.br hostHADR1
192.168.145.136 hostHADR2.com.br hostHADR2
#qDevice Machine
192.168.145.137 qDeviceHost.com.br qDevicehost
It is required Passwordless between the qDevice node and the HADR node that will be used to configure the qDevice.
The documentation says:
" Note: The db2cm command requires a passwordless SSH to be configured between the node that it is going to run on and the node that will host the QDevice. "
So, it was generated the key in the qDevicehost and copied to the hostHADR1 host with the following commands:
ssh-keygen -t rsa
ssh-copy-id root@<IP>
The key from hostHADR1 was copied to qDevicehost host with the following:
ssh-copy-id root@<IP>
The documentation followed:
Install and configure a QDevice quorum
https://www.ibm.com/docs/hr/db2/11.5?topic=utility-install-configure-qdevice-quorum
It is required to check if the Corosync is installed in the HADR nodes.
Example from one of the nodes:
hostHADR1:~ # rpm -qa | grep corosync-qdevice
corosync-qdevice-3.0.0-2.20.c0bf.db2pcmk.x86_64
The Corosync qNetd component that is need to create the qDevice must be installed from the Db2 midia.
So the midia was sent to the qDevicehost host.
Installing the corosync-qnetd
Navigated to the /<Db2_image_midia>/db2//pcmk/Linux/<OS_distribution>// and run:
zypper install --allow-unsigned-rpm corosync-qnetd*
Note:
If any dependent lib/package is missing, check if it is in the path above (it should be there)
Output:
qDeviceHost:/tmp/Db2_Installer/server_dec/db2/linuxamd64/pcmk/Linux/suse/x86_64 # zypper install --allow-unsigned-rpm ./corosync-qnetd*
Loading repository data...
Reading installed packages...
Resolving package dependencies...
The following 2 NEW packages are going to be installed:
corosync-qnetd mozilla-nss-tools
The following package has no support information from its vendor:
corosync-qnetd
2 new packages to install.
Overall download size: 771.4 KiB. Already cached: 0 B. After the operation, additional 3.2 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
Retrieving package mozilla-nss-tools-3.68-3.56.1.x86_64 (1/2), 461.6 KiB ( 2.1 MiB unpacked)
Retrieving package corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64 (2/2), 309.8 KiB ( 1.2 MiB unpacked)
Package is not signed!
Checking for file conflicts: ..............................................................................................................[done]
(1/2) Installing: mozilla-nss-tools-3.68-3.56.1.x86_64 ....................................................................................[done]
(2/2) Installing: corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64 ...........................................................................[done]
DeviceHost:/ # rpm -qa | grep corosync-qnetd
corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64
According to the documentation the ports were opened
Prerequisites for an integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using
firewall-cmd --permanent --add-port=5403/tcp
firewall-cmd --reload
From all hosts:
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using
firewall-cmd --add-port=5404-5405/udp
firewall-cmd --add-port=3121/tcp
firewall-cmd --reload
From hostHADR1 host (the one that has Passwordless configured with qDeviceHost host) it was run:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
Error: Cluster named HadrPaceDomain is already using this quorum device.
But I got the error below.
The log /tmp/db2cm.run.log* showed:
2022-01-19-14.13.32.891937 [execCmd][665] Start crm corosync show
totem {
version: 2
cluster_name: HadrPaceDomain
transport: knet
token: 10000
crypto_cipher: aes256
crypto_hash: sha256
}
nodelist {
node {
ring0_addr: hostHADR1
name: hostHADR1
nodeid: 1
}
node {
ring0_addr: hostHADR2
name: hostHADR2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: hires
function_name: on
fileline: on
}
2022-01-19-14.13.33.228334 [execCmd][665] End
2022-01-19-14.13.33.232134 [execCmd][3305] Start ssh qDeviceHost "test -f /etc/corosync/qnetd/nssdb/cluster-HadrPaceDomain.crt"
2022-01-19-14.13.33.375846 [execCmd][3305] End
2022-01-19-14.13.33.377092 [db2cm] End execution with exit code 1 on line 3307
So, I did the following:
Moved everything inside the /etc/corosync/qdevice and /etc/corosync/qnetd directories to *.old on all the nodes including the qdevice.
Restarted corosync and pacemaker in the HADR nodes:
systemctl stop corosync
systemctl stop pacemaker
systemctl start corosync
systemctl start pacemaker
Then I tried to add the qDevice again:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
Password:
Password:
Password:
Password:
Error: Could not create qdevice via corosync-qdevice-net-certutil
For some reason it required the Password 4 times until getting that new error.
The log reported:
2022-01-19-14.54.34.725268 [execCmd][3311] End
2022-01-19-14.54.34.735178 [execCmd][3316] Start /usr/sbin/corosync-qdevice-net-certutil -Q -n HadrPaceDomain qDeviceHost hostHADR1 hostHADR2
Connection closed by 192.168.145.134 port 22
Node hostHADR1 doesn't have /usr/sbin/corosync-qdevice-net-certutil installed
2022-01-19-14.57.31.464232 [execCmd][3316] End - Failed
2022-01-19-14.57.31.466790 [db2cm] End execution with exit code 1 on line 3318
I checked that certutil was required.
For suse, this executable is part of:
mozilla-nss-tools
For redhat, it is part of:
nss-tools
But I checked that mozilla-nss-tools was already installed:
hostHADR1:/home/db2inst1/sqllib/bin # rpm -qa mozilla-nss-tools
mozilla-nss-tools-3.68-3.56.1.x86_64
Anyway, I installed it. Look the following outputs:
hostHADR1:/home/db2inst1/sqllib/bin # zypper install mozilla-nss-tools
Loading repository data...
Reading installed packages...
'mozilla-nss-tools' is already installed.
There is an update candidate for 'mozilla-nss-tools' from vendor 'openSUSE', while the current vendor is 'SUSE LLC <https://www.suse.com/>'. Use 'zypper install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64' to install this candidate.
Resolving package dependencies...
Nothing to do.
So I did:
hostHADR1:/home/db2inst1/sqllib/bin # zypper install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64
Loading repository data...
Reading installed packages...
Resolving package dependencies...
Problem: the to be installed mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64 requires 'mozilla-nss >= 3.68.1', but this requirement cannot be provided
not installable providers: mozilla-nss-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
mozilla-nss-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
Solution 1: Following actions will be done:
install mozilla-nss-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
SUSE LLC <https://www.suse.com/> --> openSUSE
install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
SUSE LLC <https://www.suse.com/> --> openSUSE
install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
SUSE LLC <https://www.suse.com/> --> openSUSE
Solution 2: do not install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64
Solution 3: break mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies
Choose from above solutions by number or cancel [1/2/3/c/d/?] (c): 1
Resolving dependencies...
Resolving package dependencies...
2 Problems:
Problem: the to be installed mozilla-nss-3.68.1-lp152.2.13.1.x86_64 requires 'libfreebl3 >= 3.68.1', but this requirement cannot be provided
Problem: the to be installed libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 requires 'libsoftokn3 = 3.68.1-lp152.2.13.1', but this requirement cannot be provided
Problem: the to be installed mozilla-nss-3.68.1-lp152.2.13.1.x86_64 requires 'libfreebl3 >= 3.68.1', but this requirement cannot be provided
not installable providers: libfreebl3-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
libfreebl3-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
Solution 1: Following actions will be done:
do not install mozilla-nss-3.68.1-lp152.2.13.1.x86_64
do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
Solution 2: install libfreebl3-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
SUSE LLC <https://www.suse.com/> --> openSUSE
Solution 3: do not install mozilla-nss-3.68.1-lp152.2.13.1.x86_64
Solution 4: break mozilla-nss-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies
Choose from above solutions by number or skip, retry or cancel [1/2/3/4/s/r/c/d/?] (c): 2
Problem: the to be installed libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 requires 'libsoftokn3 = 3.68.1-lp152.2.13.1', but this requirement cannot be provided
not installable providers: libsoftokn3-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
libsoftokn3-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
Solution 1: Following actions will be done:
do not install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64
do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
Solution 2: Following actions will be done:
do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
do not install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64
Solution 3: install libsoftokn3-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
SUSE LLC <https://www.suse.com/> --> openSUSE
Solution 4: break libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies
Choose from above solutions by number or skip, retry or cancel [1/2/3/4/s/r/c/d/?] (c): 3
Resolving dependencies...
Resolving package dependencies...
The following 6 packages are going to be upgraded:
libfreebl3 libfreebl3-hmac libsoftokn3 libsoftokn3-hmac mozilla-nss mozilla-nss-tools
The following 6 packages are going to change vendor:
libfreebl3 SUSE LLC <https://www.suse.com/> -> openSUSE
libfreebl3-hmac SUSE LLC <https://www.suse.com/> -> openSUSE
libsoftokn3 SUSE LLC <https://www.suse.com/> -> openSUSE
libsoftokn3-hmac SUSE LLC <https://www.suse.com/> -> openSUSE
mozilla-nss SUSE LLC <https://www.suse.com/> -> openSUSE
mozilla-nss-tools SUSE LLC <https://www.suse.com/> -> openSUSE
The following 6 packages have no support information from their vendor:
libfreebl3 libfreebl3-hmac libsoftokn3 libsoftokn3-hmac mozilla-nss mozilla-nss-tools
6 packages to upgrade, 6 to change vendor.
Overall download size: 2.0 MiB. Already cached: 0 B. After the operation, additional 232.0 B will be used.
Continue? [y/n/v/...? shows all options] (y): y
After that, I restarted the Corosync and Pacemaker in the HADR nodes again:
systemctl stop corosync
systemctl stop pacemaker
systemctl start corosync
systemctl start pacemaker
Then, another try to add the qDevice.
For some reason it prompted to put the remote Password several times. I waited for some time after I put the password (even with the prompt requiring the Password again) and I received the Success message:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Successfully configured qdevice on nodes hostHADR1 and hostHADR2
Attempting to start qdevice on qDeviceHost
Quorum device qDeviceHost added successfully.
hostHADR1:
hostHADR1:/ # corosync-qdevice-tool -s
Qdevice information
-------------------
Model: Net
Node ID: 1
Configured node list:
0 Node ID = 1
1 Node ID = 2
Membership node list: 1, 2
Qdevice-net information
----------------------
Cluster name: HadrPaceDomain
QNetd host: qDeviceHost:5403
Algorithm: LMS
Tie-breaker: Node with lowest node ID
State: Connected
hostHADR2
hostHADR2:/etc/corosync/qdevice # corosync-qdevice-tool -s
Qdevice information
-------------------
Model: Net
Node ID: 2
Configured node list:
0 Node ID = 1
1 Node ID = 2
Membership node list: 1, 2
Qdevice-net information
----------------------
Cluster name: HadrPaceDomain
QNetd host: qDeviceHost:5403
Algorithm: LMS
Tie-breaker: Node with lowest node ID
State: Connected
Run the following corosync command on the QDevice host to verify that the quorum device is running correctly.
corosync-qnetd-tool -l
Output:
qDeviceHost:/ # corosync-qnetd-tool -l
Cluster "HadrPaceDomain":
Algorithm: LMS
Tie-breaker: Node with lowest node ID
Node ID 1:
Client address: ::ffff:192.168.145.134:55232
Configured node list: 1, 2
Membership node list: 1, 2
Vote: ACK (ACK)
Node ID 2:
Client address: ::ffff:192.168.145.136:39618
Configured node list: 1, 2
Membership node list: 1, 2
Vote: ACK (ACK)
The "crm status" doesn't change after adding the qDevice node because that node isn't part of the Cluster actualy.
hostHADR1:/ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Wed Jan 19 15:49:32 2022
* Last change: Wed Jan 19 15:38:21 2022 by root via cibadmin on hostHADR1
* 2 nodes configured
* 6 resource instances configured
Node List:
* Online: [ hostHADR1 hostHADR2 ]
Full List of Resources:
* db2_hostHADR1_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR1
* db2_hostHADR2_eth0 (ocf::heartbeat:db2ethmon): Started hostHADR2
* db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR1
* db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst): Started hostHADR2
* Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
* Masters: [ hostHADR2 ]
* Slaves: [ hostHADR1 ]
The "db2cm -list" does shows the qDevice information:
hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
Cluster Status
Domain information:
Error: Could not get Pacemaker version. - 679
Domain name = HadrPaceDomain
Pacemaker version =
Corosync version = 3.0.4
Current domain leader = hostHADR2
Number of nodes = 2
Number of resources = 6
Node information:
Name name State
---------------- --------
hostHADR1 Online
hostHADR2 Online
Resource Information:
Resource Name = db2_db2inst1_db2inst1_HADRPACE
Resource Type = HADR
DB Name = HADRPACE
Managed = true
HADR Primary Instance = db2inst1
HADR Primary Node = hostHADR2
HADR Primary State = Online
HADR Standby Instance = db2inst1
HADR Standby Node = hostHADR1
HADR Standby State = Online
Resource Name = db2_hostHADR1_db2inst1_0
State = Online
Managed = true
Resource Type = Instance
Node = hostHADR1
Instance Name = db2inst1
Resource Name = db2_hostHADR1_eth0
State = Online
Managed = true
Resource Type = Network Interface
Node = hostHADR1
Interface Name = eth0
Resource Name = db2_hostHADR2_db2inst1_0
State = Online
Managed = true
Resource Type = Instance
Node = hostHADR2
Instance Name = db2inst1
Resource Name = db2_hostHADR2_eth0
State = Online
Managed = true
Resource Type = Network Interface
Node = hostHADR2
Interface Name = eth0
Fencing Information:
Not configured
Quorum Information:
Qdevice
Qdevice information
-------------------
Model: Net
Node ID: 1
Configured node list:
0 Node ID = 1
1 Node ID = 2
Membership node list: 1, 2
Qdevice-net information
----------------------
Cluster name: HadrPaceDomain
QNetd host: qDeviceHost:5403
Algorithm: LMS
Tie-breaker: Node with lowest node ID
State: Connected
hostHADR1:/home/db2inst1/sqllib/bin # crm config show
node 1: hostHADR1 \
attributes db2hadr-db2inst1_db2inst1_HADRPACE_reint=0 db2inst-hostHADR1_db2inst1_0_start=0
node 2: hostHADR2 \
attributes db2hadr-db2inst1_db2inst1_HADRPACE_reint=-1 db2hadr-db2inst1_db2inst1_HADRPACE_takeover=0 db2inst-hostHADR2_db2inst1_0_start=0
primitive db2_db2inst1_db2inst1_HADRPACE db2hadr \
params instance="db2inst1,db2inst1" dbname=HADRPACE \
op demote interval=0s timeout=900s \
op monitor interval=9s role=Master timeout=60s \
op monitor interval=10s role=Slave timeout=60s \
op promote interval=0s timeout=900s \
op start interval=0s timeout=900s \
op stop interval=0s timeout=900s
primitive db2_hostHADR1_db2inst1_0 db2inst \
params instance=db2inst1 hostname=hostHADR1 \
op monitor timeout=120s interval=10s on-fail=restart \
op start interval=0s timeout=900s \
op stop interval=0s timeout=900s \
meta migration-threshold=0 is-managed=true
primitive db2_hostHADR1_eth0 db2ethmon \
params interface=eth0 hostname=hostHADR1 repeat_count=4 repeat_interval=4 \
op monitor timeout=30s interval=4 \
op start timeout=60s interval=0s \
op stop interval=0s timeout=20s \
meta is-managed=true
primitive db2_hostHADR2_db2inst1_0 db2inst \
params instance=db2inst1 hostname=hostHADR2 \
op monitor timeout=120s interval=10s on-fail=restart \
op start interval=0s timeout=900s \
op stop interval=0s timeout=900s \
meta migration-threshold=0 is-managed=true
primitive db2_hostHADR2_eth0 db2ethmon \
params interface=eth0 hostname=hostHADR2 repeat_count=4 repeat_interval=4 \
op monitor timeout=30s interval=4 \
op start timeout=60s interval=0s \
op stop interval=0s timeout=20s \
meta is-managed=true
ms db2_db2inst1_db2inst1_HADRPACE-clone db2_db2inst1_db2inst1_HADRPACE \
meta resource-stickiness=5000 migration-threshold=1 ordered=true promotable=true is-managed=true
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-eth0-hostHADR1 db2_db2inst1_db2inst1_HADRPACE-clone \
rule -inf: db2ethmon-eth0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-eth0-hostHADR2 db2_db2inst1_db2inst1_HADRPACE-clone \
rule -inf: db2ethmon-eth0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-node-hostHADR1 db2_db2inst1_db2inst1_HADRPACE-clone \
rule -inf: db2inst-hostHADR1_db2inst1_0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-node-hostHADR2 db2_db2inst1_db2inst1_HADRPACE-clone \
rule -inf: db2inst-hostHADR2_db2inst1_0 eq 0
location no-probe-db2_hostHADR1_db2inst1_0 db2_hostHADR1_db2inst1_0 resource-discovery=never -inf: hostHADR2
location no-probe-db2_hostHADR1_eth0 db2_hostHADR1_eth0 resource-discovery=never -inf: hostHADR2
location no-probe-db2_hostHADR2_db2inst1_0 db2_hostHADR2_db2inst1_0 resource-discovery=never -inf: hostHADR1
location no-probe-db2_hostHADR2_eth0 db2_hostHADR2_eth0 resource-discovery=never -inf: hostHADR1
location prefer-db2_hostHADR1_db2inst1_0 db2_hostHADR1_db2inst1_0 100: hostHADR1
location prefer-db2_hostHADR1_eth0 db2_hostHADR1_eth0 100: hostHADR1
location prefer-db2_hostHADR2_db2inst1_0 db2_hostHADR2_db2inst1_0 100: hostHADR2
location prefer-db2_hostHADR2_eth0 db2_hostHADR2_eth0 100: hostHADR2
location prefer-hostHADR1-db2inst1-db2_db2inst1_db2inst1_HADRPACE-clone db2_db2inst1_db2inst1_HADRPACE-clone 100: hostHADR1
hostHADR1:~ # crm_node -l
1 hostHADR1 member
2 hostHADR2 member
Once again, o see the tests from qDevice Quorum (the recommended Quorum mechanism) check my other Article below:
Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR