Introduction

This is a Test Case I did to implement Pacemaker with Db2 HADR. So this (as all the content in my Git) is not IBM official.

Pacemaker can now be used to automate Db2 HADR takeover instead of using TSA (Tivoli System Automation).

The documentation says:

"
Important: Starting from version 11.5.6, the Pacemaker cluster manager for automated fail-over to HADR standby databases is packaged and installed with Db2. In version 11.5.5, Pacemaker is included and available for production environments. In version 11.5.4, Pacemaker is included as a technical preview, and should be restricted to development, test, and proof-of-concept environments.
"
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-configuring-clustered-environment-using-db2cm-utility

In this Article we will also see the tests for Two-Node Quorum.

To see the tests from qDevice Quorum (the recommended Quorum mechanism) check my other Article below AFTER seeing this one you are reading now:

Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR

System Requirements and References

Prerequisites for an integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using

Configuring a clustered environment using the Db2 cluster manager (db2cm) utility
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-configuring-clustered-environment-using-db2cm-utility

Integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=feature-integrated-solution-using-pacemaker

Quorum devices support on Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=component-quorum-devices-support-pacemaker_

Install and configure a QDevice quorum
https://www.ibm.com/docs/hr/db2/11.5?topic=utility-install-configure-qdevice-quorum

Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR

Additional documentation - not part of this Test Case (doc. added here on set/2024)

To avoid the use of passwordless for the root

How to run db2cm without a root passwordless ssh setup
https://www.ibm.com/support/pages/node/6841049

To use VIP:

"
6. Optional: Create the VIP resources for the newly created database.

./sqllib/bin/db2cm -create -primaryVIP <IP_address> -db <database_name> –instance <instance_name>
"

Creating an HADR Db2 instance on a Pacemaker-managed Linux cluster
https://www.ibm.com/docs/en/db2/11.5?topic=option-creating-pacemaker-managed-hadr-db2-instance

Environment used for the Test Case

3 Virtual Machines
All Machines with SUSE Linux v15 SP3
2 of those Machines with Db2 v11.5.7.0 and HADR already configured and working

Hostnames and IPs:

hostHADR1 - 192.168.145.134
hostHADR2 - 192.168.145.136
qDevicehost - 192.168.145.137

Database Name: hadrpace

Instance name: db2inst1

Database HADR ports (40001 and 40002)

Agenda

First I will configure Pacemaker with Two-Node Quorum.

Next, I will add a third node to configure the qDevice Quorum.

To know more about Quorum in Pacemaker, check the following documentation:

Quorum devices support on Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=component-quorum-devices-support-pacemaker

But I can highlight what the documentation above says:

"
Based on the advantages and disadvantages shown, the QDevice quorum is the recommended quorum mechanism for Db2.
"

PS.:

As I see it, the way to configure qDevice Quorum is to first configure Two-Node Quorum and then add the third node to be que qDevice Quorum.

Preparing System Requirements

According to the System Requirements, the Db2 Fault Monitor must be turned off.

Checking if Db2 Fault Monitor is running

From both HADR machines, check:

localhost:/home/db2inst1/sqllib/bin # ps -ef | grep db2fmcd
root      50365      1  0 06:11 ?        00:01:04 /opt/ibm/db2/V11.5/bin/db2fmcd
root     128467 126831  0 14:43 pts/1    00:00:00 grep --color=auto db2fmcd

Stopping Db2 Fault Monitor

From both HADR machines, stop Fault Monitor if it is running:

localhost:/home/db2inst1/sqllib/bin # ./db2fmcu -d

Confirming the nodes IP

localhost:/home/db2inst1/sqllib/bin # ip r
default via 192.168.145.2 dev eth0 proto dhcp
192.168.145.0/24 dev eth0 proto kernel scope link src 192.168.145.134

localhost:/home/db2inst1/sqllib/bin # ip r
default via 192.168.145.2 dev eth0 proto dhcp
192.168.145.0/24 dev eth0 proto kernel scope link src 192.168.145.136

Editing the /etc/hosts

## Máquinas do HADR
192.168.145.134		hostHADR1.com.br		hostHADR1
192.168.145.136		hostHADR2.com.br		hostHADR2

Editing the db2nodes.cfg

Changed the db2nodes.cfg to reflect the new hostname.

Output from one of the node after the change:

db2inst1@hostHADR2:~/sqllib> cat db2nodes.cfg
0 hostHADR2 0

Configuring the Passwordless

As System Requirements, both HADR nodes must have Passwordless configured for root and db2inst1 (Db2 instance owner) users.

So, run the following command from both HADR machines with root and db2inst1 users:

ssh-keygen -t rsa

Example:

hostHADR1:~/.ssh # ssh-keygen -t rsa   //just hit ENTER for all options
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:EWFudSQJf+oLD3PD0X11fC8F9sZJVpy2JJ76RzRXpS4 root@hostHADR1
The key's randomart image is:
+---[RSA 3072]----+
|        =oooo o+*|
|       o +.o ooX+|
|        + . o *.%|
|       . . + = **|
|        S o E = =|
|         o o . + |
|        + = . .  |
|         * o . . |
|          o   .  |
+----[SHA256]-----+

hostHADR1:~/.ssh # ls -ltr
total 8
-rw-r--r-- 1 root root  568 Jan  7 17:03 id_rsa.pub
-rw------- 1 root root 2602 Jan  7 17:03 id_rsa

The command above creates the //.ssh and the //.ssh/id_rsa* files that are the authentication keys.

Now, copying the keys from on node to other with the following command (need to be done from both HADR machines and for root and db2inst1 users).

ssh-copy-id <user>@<target_machine_IP>

Installing and Configuring Pacemaker

It is required to use the Pacemaker and Corosync provided by IBM to configure the Db2 HADR automation.

From version v11.5.7.0, Pacemaker is supposed to be installed when Db2 is installed.

So I tried to run the firs step that was Creating the Cluster.

Trying to Create the Cluster and to Add the Network Resources

Command:

db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0

Result:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0
Line: 1198 Error running command:
crm configure property stonith-enabled=false



hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
      Cluster Status

Line: 501 Error running command:
crm config show

Running db2installPCMK

So, I manually uninstalled and installed Pacemaker again. You can first run db2prereqPCMK to check if System Requirements are attended before running the installer.

So, from the Db2 media it was run FROM BOTH HADR machines. Example from one of the machines:

uninstalling Pacemaker:

hostHADR2:/tmp/Db2-Installers/server_dec/db2/linuxamd64/pcmk # ./db2uninstallPCMK

Removing Db2 agents
Success.

Uninstalling Pacemaker
Success.

The db2uninstallPCMK program completed successfully.
See /tmp/db2uninstallPCMK.log.30826 for more information.

Installing Pacemaker:

hostHADR2:/tmp/Db2-Installers/server_dec/db2/linuxamd64/pcmk # ./db2installPCMK -i
Installing "Pacemaker"

Success

DBI1070I  Program db2installPCMK completed successfully.

Copying required files:

hostHADR2:/opt/ibm/db2/V11.5/ha/pcmk # cp * /usr/lib/ocf/resource.d/heartbeat/

hostHADR2:/usr/lib/ocf/resource.d/heartbeat # ls -la |grep -i db2
-rwxr-xr-x 1 root root  25331 Jul 20  2020 db2
-r-xr-xr-x 1 root root  26109 Jan 10 17:57 db2ethmon
-r-xr-xr-x 1 root root  41082 Jan 10 17:57 db2hadr
-r-xr-xr-x 1 root root  27370 Jan 10 17:57 db2inst



total 100
-r-xr-xr-x 1 root root 26109 Nov 22 15:28 db2ethmon
-r-xr-xr-x 1 root root 27370 Nov 22 15:28 db2inst
-r-xr-xr-x 1 root root 41082 Nov 22 15:28 db2hadr

Creating the Cluster and to Adding the Network Resources

Now the Cluster creation worked fine.

db2cm utiliy must be run with ROOT user.

The command must be run from only one of the HADR nodes no matter which one.

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -cluster -domain HadrPaceDomain -host hostHADR1 -publicEthernet eth0 -host hostHADR2 -publicEthernet eth0
Created db2_hostHADR1_eth0 resource.
Created db2_hostHADR2_eth0 resource.
Cluster created successfully.

Checking the Cluster was created:

hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:01:20 2022
  * Last change:  Mon Jan 10 17:59:12 2022 by root via cibadmin on hostHADR1
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ hostHADR1 hostHADR2 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR1
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2

Another Possible Issue during Cluster Creation

You may receive the following error during cluster creation depending on the version you are using:

[root@pacmk1 instance]# /opt/ibm/db2/V11.5/bin/db2cm -create -cluster -domain HadrPaceDomain -host pacmk1 -publicEthernet eth0 -host pacmk2 -publicEthernet eth0
Line: 1328 Error running command:
crm configure property stonith-enabled=false

[root@pacmk1 instance]# crm status
ERROR: Can not create ssh session from pacmk1 to pacmk1.

That is due to the following known crmsh issue:

Fix: upgradeutil: support the change of path of upgrade_seq in crmsh-4.5
https://github.com/ClusterLabs/crmsh/pull/1212

The workaround for it is to execute the following on both servers:

mkdir -p /var/lib/crmsh
echo '1.0' > /var/lib/crmsh/upgrade_seq

Check crm status by

crm -d status

Then delete the cluster and recreate it.

db2cm -delete -cluster
db2cm -create -cluster ....

Creating the Instance resource

For hosrHADR1 host:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -instance db2inst1 -host hostHADR1
Created db2_hostHADR1_db2inst1_0 resource.
Instance resource for db2inst1 on hostHADR1 created successfully.

For hosrHADR2 host (remember that the commands are executed from the same node as the previous command):

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -instance db2inst1 -host hostHADR2
Created db2_hostHADR2_db2inst1_0 resource.
Instance resource for db2inst1 on hostHADR2 created successfully.

Checking status again:

hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:05:07 2022
  * Last change:  Mon Jan 10 18:04:15 2022 by root via cibadmin on hostHADR1
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ hostHADR1 hostHADR2 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR1
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR1
  * db2_hostHADR2_db2inst1_0    (ocf::heartbeat:db2inst):        Stopped

The db2_hostHADR2_db2inst1_0 is showed as "Stopped".

After some moments it got "Started" status.

hostHADR1:/home/db2inst1/sqllib/bin # crm status

Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:10:20 2022
  * Last change:  Mon Jan 10 18:04:15 2022 by root via cibadmin on hostHADR1
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ hostHADR1 hostHADR2 ]

Full List of Resources:
  * db2_hostHADR1_eth0       (ocf::heartbeat:db2ethmon):      Started hostHADR1
  * db2_hostHADR2_eth0       (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0 (ocf::heartbeat:db2inst):        Started hostHADR1
  * db2_hostHADR2_db2inst1_0 (ocf::heartbeat:db2inst):        Started hostHADR2

Checking the db cfg "Cluster manager" parameter:

hostHADR1:/home/db2inst1/sqllib/bin # su - db2inst1
db2inst1@hostHADR1:~> db2 get dbm cfg | grep -i cluster
 Cluster manager                                         = PACEMAKER

Creating the Database resource

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -db hadrpace -instance db2inst1
Database resource for HADRPACE created successfully.

Checking the status again:

hostHADR1:/home/db2inst1/sqllib/bin # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR1 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:16:03 2022
  * Last change:  Mon Jan 10 18:15:43 2022 by root via cibadmin on hostHADR1
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hostHADR1 hostHADR2 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR1
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR1
  * db2_hostHADR2_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR2
  * Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
    * Masters: [ hostHADR1 ]
    * Slaves: [ hostHADR2 ]

Checking with db2cm -list:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
      Cluster Status

Domain information:
Error: Could not get Pacemaker version. - 679
Domain name               = HadrPaceDomain
Pacemaker version         =
Corosync version          = 3.0.4
Current domain leader     = hostHADR1
Number of nodes           = 2
Number of resources       = 6

Node information:
Name name           State
----------------    --------
hostHADR1           Online
hostHADR2           Online

Resource Information:

Resource Name             = db2_db2inst1_db2inst1_HADRPACE
  Resource Type                 = HADR
    DB Name                     = HADRPACE
    Managed                     = true
    HADR Primary Instance       = db2inst1
    HADR Primary Node           = hostHADR1
    HADR Primary State          = Online
    HADR Standby Instance       = db2inst1
    HADR Standby Node           = hostHADR2
    HADR Standby State          = Online

Resource Name             = db2_hostHADR1_db2inst1_0
  State                         = Online
  Managed                       = true
  Resource Type                 = Instance
    Node                        = hostHADR1
    Instance Name               = db2inst1

Resource Name             = db2_hostHADR1_eth0
  State                         = Online
  Managed                       = true
  Resource Type                 = Network Interface
    Node                        = hostHADR1
    Interface Name              = eth0

Resource Name             = db2_hostHADR2_db2inst1_0
  State                         = Online
  Managed                       = true
  Resource Type                 = Instance
    Node                        = hostHADR2
    Instance Name               = db2inst1

Resource Name             = db2_hostHADR2_eth0
  State                         = Online
  Managed                       = true
  Resource Type                 = Network Interface
    Node                        = hostHADR2
    Interface Name              = eth0

Fencing Information:
  Not configured
Quorum Information:
  Two-node quorum

The Two-Node Quorum is configured. =)

Testing Two-Quorum Takeover Automation

In this Test I will turn off virtual machines that has the Primary HADR database to see the Pacemaker taking over the cluster.

Turned off host hostHADR1 (that was the Primary database at that time):

hostHADR1:~ # shutdown now

The old standby (hostHADR2) became the Primary:

hostHADR2:~ # su - db2inst1
db2inst1@hostHADR2:~> db2pd -d hadrpace -hadr

Database Member 0 -- Database HADRPACE -- Active -- Up 0 days 02:45:28 -- Date 2022-01-10-18.45.08.919199

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = DISCONNECTED
                           HADR_FLAGS =
                  PRIMARY_MEMBER_HOST = 192.168.145.136
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.145.134
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = DISCONNECTED
             HADR_CONNECT_STATUS_TIME = 01/10/2022 18:43:49.748353 (1641851029)

Checking "crm status":

hostHADR2:/usr/lib/ocf/resource.d/heartbeat # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:44:02 2022
  * Last change:  Mon Jan 10 18:43:58 2022 by root via crm_attribute on hostHADR2
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hostHADR2 ]
  * OFFLINE: [ hostHADR1 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Stopped
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0    (ocf::heartbeat:db2inst):        Stopped
  * db2_hostHADR2_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR2
  * Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
    * db2_db2inst1_db2inst1_HADRPACE    (ocf::heartbeat:db2hadr):        Master hostHADR2 (Monitoring)
    * Stopped: [ hostHADR1 ]

Turning the Virtual Machine on again.

After the virtual machine was turned on again, the "crm status" changed the status some times until getting all resources Online/Started:

hostHADR2:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Mon Jan 10 18:46:09 2022
  * Last change:  Mon Jan 10 18:43:58 2022 by root via crm_attribute on hostHADR2
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hostHADR2 ]
  * OFFLINE: [ hostHADR1 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Stopped
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0    (ocf::heartbeat:db2inst):        Stopped
  * db2_hostHADR2_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR2
  * Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
    * Masters: [ hostHADR2 ]
    * Stopped: [ hostHADR1 ]

db2pd showed the HADR was Connected and Peer:

db2inst1@hostHADR2:~> db2pd -d hadrpace -hadr

Database Member 0 -- Database HADRPACE -- Active -- Up 0 days 02:58:15 -- Date 2022-01-10-18.57.55.733919

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = 192.168.145.136
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.145.134
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED

  Great. It worked :)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Configuring the qDevice Quorum

The qDevice host will be added to the cluster configured above.

This new host will be an arbitrator node so it is not actually part of the cluster and it can be used as arbitrator of N pacemakers clusters.

Configuring System Requirements

Editing /etc/hosts

In all 3 hosts it was edited the /etc/hosts to have this:

## HADR Machines
192.168.145.134         hostHADR1.com.br                hostHADR1
192.168.145.136         hostHADR2.com.br                hostHADR2

#qDevice Machine
192.168.145.137           qDeviceHost.com.br      qDevicehost

Configuring Passwordless

It is required Passwordless between the qDevice node and the HADR node that will be used to configure the qDevice.

The documentation says:

" Note: The db2cm command requires a passwordless SSH to be configured between the node that it is going to run on and the node that will host the QDevice. "

So, it was generated the key in the qDevicehost and copied to the hostHADR1 host with the following commands:

ssh-keygen -t rsa
			
ssh-copy-id root@<IP>

The key from hostHADR1 was copied to qDevicehost host with the following:

ssh-copy-id root@<IP>

Configuring the qDevice

The documentation followed:

Install and configure a QDevice quorum
https://www.ibm.com/docs/hr/db2/11.5?topic=utility-install-configure-qdevice-quorum

It is required to check if the Corosync is installed in the HADR nodes.

Example from one of the nodes:

hostHADR1:~ # rpm -qa | grep corosync-qdevice
corosync-qdevice-3.0.0-2.20.c0bf.db2pcmk.x86_64

The Corosync qNetd component that is need to create the qDevice must be installed from the Db2 midia.

So the midia was sent to the qDevicehost host.

Installing the corosync-qnetd

Navigated to the /<Db2_image_midia>/db2//pcmk/Linux/<OS_distribution>// and run:

zypper install --allow-unsigned-rpm corosync-qnetd*

Note:

If any dependent lib/package is missing, check if it is in the path above (it should be there)

Output:

qDeviceHost:/tmp/Db2_Installer/server_dec/db2/linuxamd64/pcmk/Linux/suse/x86_64 # zypper install --allow-unsigned-rpm ./corosync-qnetd*
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 2 NEW packages are going to be installed:
  corosync-qnetd mozilla-nss-tools

The following package has no support information from its vendor:
  corosync-qnetd

2 new packages to install.
Overall download size: 771.4 KiB. Already cached: 0 B. After the operation, additional 3.2 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
Retrieving package mozilla-nss-tools-3.68-3.56.1.x86_64                                                     (1/2), 461.6 KiB (  2.1 MiB unpacked)
Retrieving package corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64                                            (2/2), 309.8 KiB (  1.2 MiB unpacked)
    Package is not signed!


Checking for file conflicts: ..............................................................................................................[done]
(1/2) Installing: mozilla-nss-tools-3.68-3.56.1.x86_64 ....................................................................................[done]
(2/2) Installing: corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64 ...........................................................................[done]

Checking that corosync-qnetd is installed

	DeviceHost:/ # rpm -qa | grep corosync-qnetd
	corosync-qnetd-3.0.0-2.20.c0bf.db2pcmk.x86_64

Openning required Firewall ports

According to the documentation the ports were opened

Prerequisites for an integrated solution using Pacemaker
https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using

		firewall-cmd --permanent --add-port=5403/tcp
		
		firewall-cmd --reload

From all hosts:

https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using

		firewall-cmd --add-port=5404-5405/udp
		firewall-cmd --add-port=3121/tcp
		firewall-cmd --reload

Configuring qDevice

From hostHADR1 host (the one that has Passwordless configured with qDeviceHost host) it was run:

	hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
	Error: Cluster named HadrPaceDomain is already using this quorum device.

But I got the error below.

The log /tmp/db2cm.run.log* showed:

2022-01-19-14.13.32.891937 [execCmd][665] Start crm corosync show
totem {
    version: 2
    cluster_name: HadrPaceDomain
    transport: knet
    token: 10000
    crypto_cipher: aes256
    crypto_hash: sha256
}
nodelist {
    node {
        ring0_addr: hostHADR1
        name: hostHADR1
        nodeid: 1
    }
    node {
        ring0_addr: hostHADR2
        name: hostHADR2
        nodeid: 2
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}
logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    timestamp: hires
    function_name: on
    fileline: on
}
2022-01-19-14.13.33.228334 [execCmd][665] End
2022-01-19-14.13.33.232134 [execCmd][3305] Start ssh qDeviceHost "test -f /etc/corosync/qnetd/nssdb/cluster-HadrPaceDomain.crt"

2022-01-19-14.13.33.375846 [execCmd][3305] End
2022-01-19-14.13.33.377092 [db2cm] End execution with exit code 1 on line 3307

So, I did the following:

Moved everything inside the /etc/corosync/qdevice and /etc/corosync/qnetd directories to *.old on all the nodes including the qdevice.

Restarted corosync and pacemaker in the HADR nodes:

			systemctl stop corosync
 			systemctl stop pacemaker


 			systemctl start corosync
 			systemctl start pacemaker

Then I tried to add the qDevice again:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
Password:
Password:
Password:
Password:
Error: Could not create qdevice via corosync-qdevice-net-certutil

For some reason it required the Password 4 times until getting that new error.

The log reported:

2022-01-19-14.54.34.725268 [execCmd][3311] End
2022-01-19-14.54.34.735178 [execCmd][3316] Start /usr/sbin/corosync-qdevice-net-certutil -Q -n HadrPaceDomain qDeviceHost hostHADR1 hostHADR2
Connection closed by 192.168.145.134 port 22
Node hostHADR1 doesn't have /usr/sbin/corosync-qdevice-net-certutil installed
2022-01-19-14.57.31.464232 [execCmd][3316] End - Failed
2022-01-19-14.57.31.466790 [db2cm] End execution with exit code 1 on line 3318

I checked that certutil was required.

For suse, this executable is part of:

				mozilla-nss-tools


		For redhat, it is part of:

				nss-tools

But I checked that mozilla-nss-tools was already installed:

hostHADR1:/home/db2inst1/sqllib/bin # rpm -qa mozilla-nss-tools
mozilla-nss-tools-3.68-3.56.1.x86_64

Anyway, I installed it. Look the following outputs:

hostHADR1:/home/db2inst1/sqllib/bin # zypper install mozilla-nss-tools
Loading repository data...
Reading installed packages...
'mozilla-nss-tools' is already installed.
There is an update candidate for 'mozilla-nss-tools' from vendor 'openSUSE', while the current vendor is 'SUSE LLC <https://www.suse.com/>'. Use 'zypper install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64' to install this candidate.
Resolving package dependencies...
Nothing to do.

So I did:

hostHADR1:/home/db2inst1/sqllib/bin # zypper install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64
Loading repository data...
Reading installed packages...
Resolving package dependencies...

Problem: the to be installed mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64 requires 'mozilla-nss >= 3.68.1', but this requirement cannot be provided
  not installable providers: mozilla-nss-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
                   mozilla-nss-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
 Solution 1: Following actions will be done:
  install mozilla-nss-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
    SUSE LLC <https://www.suse.com/>  -->  openSUSE
  install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
    SUSE LLC <https://www.suse.com/>  -->  openSUSE
  install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
    SUSE LLC <https://www.suse.com/>  -->  openSUSE
 Solution 2: do not install mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64
 Solution 3: break mozilla-nss-tools-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies

Choose from above solutions by number or cancel [1/2/3/c/d/?] (c): 1
Resolving dependencies...
Resolving package dependencies...
2 Problems:
Problem: the to be installed mozilla-nss-3.68.1-lp152.2.13.1.x86_64 requires 'libfreebl3 >= 3.68.1', but this requirement cannot be provided
Problem: the to be installed libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 requires 'libsoftokn3 = 3.68.1-lp152.2.13.1', but this requirement cannot be provided

Problem: the to be installed mozilla-nss-3.68.1-lp152.2.13.1.x86_64 requires 'libfreebl3 >= 3.68.1', but this requirement cannot be provided
  not installable providers: libfreebl3-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
                   libfreebl3-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
 Solution 1: Following actions will be done:
  do not install mozilla-nss-3.68.1-lp152.2.13.1.x86_64
  do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
 Solution 2: install libfreebl3-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
  SUSE LLC <https://www.suse.com/>  -->  openSUSE
 Solution 3: do not install mozilla-nss-3.68.1-lp152.2.13.1.x86_64
 Solution 4: break mozilla-nss-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies

Choose from above solutions by number or skip, retry or cancel [1/2/3/4/s/r/c/d/?] (c): 2

Problem: the to be installed libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 requires 'libsoftokn3 = 3.68.1-lp152.2.13.1', but this requirement cannot be provided
  not installable providers: libsoftokn3-3.68.1-lp152.2.13.1.i586[openSUSE_Leap_15.2_Update]
                   libsoftokn3-3.68.1-lp152.2.13.1.x86_64[openSUSE_Leap_15.2_Update]
 Solution 1: Following actions will be done:
  do not install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64
  do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
 Solution 2: Following actions will be done:
  do not install libfreebl3-hmac-3.68.1-lp152.2.13.1.x86_64
  do not install libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64
 Solution 3: install libsoftokn3-3.68.1-lp152.2.13.1.x86_64 (with vendor change)
  SUSE LLC <https://www.suse.com/>  -->  openSUSE
 Solution 4: break libsoftokn3-hmac-3.68.1-lp152.2.13.1.x86_64 by ignoring some of its dependencies

Choose from above solutions by number or skip, retry or cancel [1/2/3/4/s/r/c/d/?] (c): 3
Resolving dependencies...
Resolving package dependencies...

The following 6 packages are going to be upgraded:
  libfreebl3 libfreebl3-hmac libsoftokn3 libsoftokn3-hmac mozilla-nss mozilla-nss-tools

The following 6 packages are going to change vendor:
  libfreebl3         SUSE LLC <https://www.suse.com/> -> openSUSE
  libfreebl3-hmac    SUSE LLC <https://www.suse.com/> -> openSUSE
  libsoftokn3        SUSE LLC <https://www.suse.com/> -> openSUSE
  libsoftokn3-hmac   SUSE LLC <https://www.suse.com/> -> openSUSE
  mozilla-nss        SUSE LLC <https://www.suse.com/> -> openSUSE
  mozilla-nss-tools  SUSE LLC <https://www.suse.com/> -> openSUSE

The following 6 packages have no support information from their vendor:
  libfreebl3 libfreebl3-hmac libsoftokn3 libsoftokn3-hmac mozilla-nss mozilla-nss-tools

6 packages to upgrade, 6  to change vendor.
Overall download size: 2.0 MiB. Already cached: 0 B. After the operation, additional 232.0 B will be used.
Continue? [y/n/v/...? shows all options] (y): y

After that, I restarted the Corosync and Pacemaker in the HADR nodes again:

 	systemctl stop corosync
 	systemctl stop pacemaker


 	systemctl start corosync
 	systemctl start pacemaker

Then, another try to add the qDevice.

For some reason it prompted to put the remote Password several times. I waited for some time after I put the password (even with the prompt requiring the Password again) and I received the Success message:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -create -qdevice qDeviceHost
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Successfully configured qdevice on nodes hostHADR1 and hostHADR2
Attempting to start qdevice on qDeviceHost
Quorum device qDeviceHost added successfully.

Checking in the in the HADR nodes if the setup was ok:

hostHADR1:

hostHADR1:/ # corosync-qdevice-tool -s
Qdevice information
-------------------
Model:                  Net
Node ID:                1
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
Membership node list:   1, 2

Qdevice-net information
----------------------
Cluster name:           HadrPaceDomain
QNetd host:             qDeviceHost:5403
Algorithm:              LMS
Tie-breaker:            Node with lowest node ID
State:                  Connected

hostHADR2

hostHADR2:/etc/corosync/qdevice # corosync-qdevice-tool -s
Qdevice information
-------------------
Model:                  Net
Node ID:                2
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
Membership node list:   1, 2

Qdevice-net information
----------------------
Cluster name:           HadrPaceDomain
QNetd host:             qDeviceHost:5403
Algorithm:              LMS
Tie-breaker:            Node with lowest node ID
State:                  Connected

Run the following corosync command on the QDevice host to verify that the quorum device is running correctly.

				corosync-qnetd-tool -l

Output:

qDeviceHost:/ # corosync-qnetd-tool -l


Cluster "HadrPaceDomain":
    Algorithm:          LMS
    Tie-breaker:        Node with lowest node ID
    Node ID 1:
        Client address:         ::ffff:192.168.145.134:55232
        Configured node list:   1, 2
        Membership node list:   1, 2
        Vote:                   ACK (ACK)
    Node ID 2:
        Client address:         ::ffff:192.168.145.136:39618
        Configured node list:   1, 2
        Membership node list:   1, 2
        Vote:                   ACK (ACK)

The "crm status" after qDevice node

The "crm status" doesn't change after adding the qDevice node because that node isn't part of the Cluster actualy.

hostHADR1:/ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: hostHADR2 (version 2.0.5+20201202.ba59be712-2.30.db2pcmk-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Wed Jan 19 15:49:32 2022
  * Last change:  Wed Jan 19 15:38:21 2022 by root via cibadmin on hostHADR1
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hostHADR1 hostHADR2 ]

Full List of Resources:
  * db2_hostHADR1_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR1
  * db2_hostHADR2_eth0  (ocf::heartbeat:db2ethmon):      Started hostHADR2
  * db2_hostHADR1_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR1
  * db2_hostHADR2_db2inst1_0    (ocf::heartbeat:db2inst):        Started hostHADR2
  * Clone Set: db2_db2inst1_db2inst1_HADRPACE-clone [db2_db2inst1_db2inst1_HADRPACE] (promotable):
    * Masters: [ hostHADR2 ]
    * Slaves: [ hostHADR1 ]

The "db2cm -list" output after qDevice added

The "db2cm -list" does shows the qDevice information:

hostHADR1:/home/db2inst1/sqllib/bin # ./db2cm -list
      Cluster Status

Domain information:
Error: Could not get Pacemaker version. - 679
Domain name               = HadrPaceDomain
Pacemaker version         =
Corosync version          = 3.0.4
Current domain leader     = hostHADR2
Number of nodes           = 2
Number of resources       = 6

Node information:
Name name           State
----------------    --------
hostHADR1           Online
hostHADR2           Online

Resource Information:

Resource Name             = db2_db2inst1_db2inst1_HADRPACE
  Resource Type                 = HADR
    DB Name                     = HADRPACE
    Managed                     = true
    HADR Primary Instance       = db2inst1
    HADR Primary Node           = hostHADR2
    HADR Primary State          = Online
    HADR Standby Instance       = db2inst1
    HADR Standby Node           = hostHADR1
    HADR Standby State          = Online

Resource Name             = db2_hostHADR1_db2inst1_0
  State                         = Online
  Managed                       = true
  Resource Type                 = Instance
    Node                        = hostHADR1
    Instance Name               = db2inst1

Resource Name             = db2_hostHADR1_eth0
  State                         = Online
  Managed                       = true
  Resource Type                 = Network Interface
    Node                        = hostHADR1
    Interface Name              = eth0

Resource Name             = db2_hostHADR2_db2inst1_0
  State                         = Online
  Managed                       = true
  Resource Type                 = Instance
    Node                        = hostHADR2
    Instance Name               = db2inst1

Resource Name             = db2_hostHADR2_eth0
  State                         = Online
  Managed                       = true
  Resource Type                 = Network Interface
    Node                        = hostHADR2
    Interface Name              = eth0

Fencing Information:
  Not configured
Quorum Information:
  Qdevice

Qdevice information
-------------------
Model:                  Net
Node ID:                1
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
Membership node list:   1, 2

Qdevice-net information
----------------------
Cluster name:           HadrPaceDomain
QNetd host:             qDeviceHost:5403
Algorithm:              LMS
Tie-breaker:            Node with lowest node ID
State:                  Connected

The "crm config show" output:

hostHADR1:/home/db2inst1/sqllib/bin # crm config show

node 1: hostHADR1 \
        attributes db2hadr-db2inst1_db2inst1_HADRPACE_reint=0 db2inst-hostHADR1_db2inst1_0_start=0
node 2: hostHADR2 \
        attributes db2hadr-db2inst1_db2inst1_HADRPACE_reint=-1 db2hadr-db2inst1_db2inst1_HADRPACE_takeover=0 db2inst-hostHADR2_db2inst1_0_start=0
primitive db2_db2inst1_db2inst1_HADRPACE db2hadr \
        params instance="db2inst1,db2inst1" dbname=HADRPACE \
        op demote interval=0s timeout=900s \
        op monitor interval=9s role=Master timeout=60s \
        op monitor interval=10s role=Slave timeout=60s \
        op promote interval=0s timeout=900s \
        op start interval=0s timeout=900s \
        op stop interval=0s timeout=900s
primitive db2_hostHADR1_db2inst1_0 db2inst \
        params instance=db2inst1 hostname=hostHADR1 \
        op monitor timeout=120s interval=10s on-fail=restart \
        op start interval=0s timeout=900s \
        op stop interval=0s timeout=900s \
        meta migration-threshold=0 is-managed=true
primitive db2_hostHADR1_eth0 db2ethmon \
        params interface=eth0 hostname=hostHADR1 repeat_count=4 repeat_interval=4 \
        op monitor timeout=30s interval=4 \
        op start timeout=60s interval=0s \
        op stop interval=0s timeout=20s \
        meta is-managed=true
primitive db2_hostHADR2_db2inst1_0 db2inst \
        params instance=db2inst1 hostname=hostHADR2 \
        op monitor timeout=120s interval=10s on-fail=restart \
        op start interval=0s timeout=900s \
        op stop interval=0s timeout=900s \
        meta migration-threshold=0 is-managed=true
primitive db2_hostHADR2_eth0 db2ethmon \
        params interface=eth0 hostname=hostHADR2 repeat_count=4 repeat_interval=4 \
        op monitor timeout=30s interval=4 \
        op start timeout=60s interval=0s \
        op stop interval=0s timeout=20s \
        meta is-managed=true
ms db2_db2inst1_db2inst1_HADRPACE-clone db2_db2inst1_db2inst1_HADRPACE \
        meta resource-stickiness=5000 migration-threshold=1 ordered=true promotable=true is-managed=true
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-eth0-hostHADR1 db2_db2inst1_db2inst1_HADRPACE-clone \
        rule -inf: db2ethmon-eth0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-eth0-hostHADR2 db2_db2inst1_db2inst1_HADRPACE-clone \
        rule -inf: db2ethmon-eth0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-node-hostHADR1 db2_db2inst1_db2inst1_HADRPACE-clone \
        rule -inf: db2inst-hostHADR1_db2inst1_0 eq 0
location loc-rule-db2_db2inst1_db2inst1_HADRPACE-node-hostHADR2 db2_db2inst1_db2inst1_HADRPACE-clone \
        rule -inf: db2inst-hostHADR2_db2inst1_0 eq 0
location no-probe-db2_hostHADR1_db2inst1_0 db2_hostHADR1_db2inst1_0 resource-discovery=never -inf: hostHADR2
location no-probe-db2_hostHADR1_eth0 db2_hostHADR1_eth0 resource-discovery=never -inf: hostHADR2
location no-probe-db2_hostHADR2_db2inst1_0 db2_hostHADR2_db2inst1_0 resource-discovery=never -inf: hostHADR1
location no-probe-db2_hostHADR2_eth0 db2_hostHADR2_eth0 resource-discovery=never -inf: hostHADR1
location prefer-db2_hostHADR1_db2inst1_0 db2_hostHADR1_db2inst1_0 100: hostHADR1
location prefer-db2_hostHADR1_eth0 db2_hostHADR1_eth0 100: hostHADR1
location prefer-db2_hostHADR2_db2inst1_0 db2_hostHADR2_db2inst1_0 100: hostHADR2
location prefer-db2_hostHADR2_eth0 db2_hostHADR2_eth0 100: hostHADR2
location prefer-hostHADR1-db2inst1-db2_db2inst1_db2inst1_HADRPACE-clone db2_db2inst1_db2inst1_HADRPACE-clone 100: hostHADR1

The "crm_node -l" output:

hostHADR1:~ # crm_node -l
1 hostHADR1 member
2 hostHADR2 member

Once again, o see the tests from qDevice Quorum (the recommended Quorum mechanism) check my other Article below:

Testing Pacemaker qDevice Quorum with Db2 HADR
https://github.com/fsmegale/Db2-luw/wiki/Testing-Pacemaker-qDevice-Quorum-with-Db2-HADR

Automating Db2 HADR with Pacemaker - fsmegale/Db2-luw GitHub Wiki

Introduction

System Requirements and References

Additional documentation - not part of this Test Case (doc. added here on set/2024)

Environment used for the Test Case

Agenda

Preparing System Requirements

Checking if Db2 Fault Monitor is running

Stopping Db2 Fault Monitor

Confirming the nodes IP

Editing the /etc/hosts

Editing the db2nodes.cfg

Configuring the Passwordless

Installing and Configuring Pacemaker

Trying to Create the Cluster and to Add the Network Resources

Running db2installPCMK

Creating the Cluster and to Adding the Network Resources

Another Possible Issue during Cluster Creation

Creating the Instance resource

Creating the Database resource

Testing Two-Quorum Takeover Automation

Configuring the qDevice Quorum

Configuring System Requirements

Editing /etc/hosts

Configuring Passwordless

Configuring the qDevice

Checking that corosync-qnetd is installed

Openning required Firewall ports

Configuring qDevice

Checking in the in the HADR nodes if the setup was ok:

Run the following corosync command on the QDevice host to verify that the quorum device is running correctly.

The "crm status" after qDevice node

The "db2cm -list" output after qDevice added

The "crm config show" output:

The "crm_node -l" output:

⚠️ GitHub.com Fallback ⚠️

Automating Db2 HADR with Pacemaker - fsmegale/Db2-luw GitHub Wiki

Introduction

System Requirements and References

Additional documentation - not part of this Test Case (doc. added here on set/2024)

Environment used for the Test Case

Agenda

Preparing System Requirements

Checking if Db2 Fault Monitor is running

Stopping Db2 Fault Monitor

Confirming the nodes IP

Editing the /etc/hosts

Editing the db2nodes.cfg

Configuring the Passwordless

Installing and Configuring Pacemaker

Trying to Create the Cluster and to Add the Network Resources

Running db2installPCMK

Creating the Cluster and to Adding the Network Resources

Another Possible Issue during Cluster Creation

Creating the Instance resource

Creating the Database resource

Testing Two-Quorum Takeover Automation

Configuring the qDevice Quorum

Configuring System Requirements

Editing /etc/hosts

Configuring Passwordless

Configuring the qDevice

Checking that corosync-qnetd is installed

Openning required Firewall ports

Configuring qDevice

Checking in the in the HADR nodes if the setup was ok:

Run the following corosync command on the QDevice host to verify that the quorum device is running correctly.

The "crm status" after qDevice node

The "db2cm -list" output after qDevice added

The "crm config show" output:

The "crm_node -l" output:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️