Controller HA DR in a JuJu deployment - ganeshahv/Contrail_SRE GitHub Wiki

Failure

In a cluster with 2n + 1 controllers, failure of a majority number of nodes could lead to loss of quorum.

Recovery

1]. Remove the units from the failed machines.

juju remove-unit contrail-analytics/0
juju remove-unit contrail-analyticsdb/0
juju remove-unit contrail-controller/0
juju remove-unit contrail-haproxy/0
juju remove-unit contrail-keystone-auth/0
juju remove-unit glance/0
juju remove-unit heat/0
juju remove-unit keystone/0
juju remove-unit memcached/0
juju remove-unit mysql/0
juju remove-unit neutron-api/0
juju remove-unit nova-cloud-controller/0
juju remove-unit openstack-dashboard/0
juju remove-unit placement/0
juju remove-unit rabbitmq-server/0
juju remove-unit contrail-analytics/1
juju remove-unit contrail-analyticsdb/1
juju remove-unit contrail-controller/1
juju remove-unit contrail-haproxy/1
juju remove-unit contrail-keystone-auth/1
juju remove-unit glance/1
juju remove-unit heat/1
juju remove-unit keystone/1
juju remove-unit memcached/1
juju remove-unit mysql/1
juju remove-unit neutron-api/1
juju remove-unit nova-cloud-controller/1
juju remove-unit openstack-dashboard/1
juju remove-unit placement/1
juju remove-unit rabbitmq-server/1

2]. Remove the failed machines from juju.

juju remove-machine 0/kvm/0 --force
juju remove-machine 0/kvm/1 --force
juju remove-machine 0/kvm/2 --force
juju remove-machine 0/lxd/0 --force
juju remove-machine 0/lxd/1 --force
juju remove-machine 0/lxd/2 --force
juju remove-machine 0/lxd/3 --force
juju remove-machine 0/lxd/4 --force
juju remove-machine 0/lxd/5 --force
juju remove-machine 0/lxd/6 --force
juju remove-machine 0/lxd/7 --force
juju remove-machine 0/lxd/8 --force
juju remove-machine 0/lxd/9 --force
juju remove-machine 0/lxd/10 --force
juju remove-machine 0/lxd/11 --force
juju remove-machine 1/kvm/0 --force
juju remove-machine 1/kvm/1 --force
juju remove-machine 1/kvm/2 --force
juju remove-machine 1/lxd/0 --force
juju remove-machine 1/lxd/1 --force
juju remove-machine 1/lxd/2 --force
juju remove-machine 1/lxd/3 --force
juju remove-machine 1/lxd/4 --force
juju remove-machine 1/lxd/5 --force
juju remove-machine 1/lxd/6 --force
juju remove-machine 1/lxd/7 --force
juju remove-machine 1/lxd/8 --force
juju remove-machine 1/lxd/9 --force
juju remove-machine 1/lxd/10 --force
juju remove-machine 1/lxd/11 --force
juju remove-machine 0 --force
juju remove-machine 1 --force

3]. Add units to the newly added machine

juju add-unit mysql --to lxd:5
juju add-unit contrail-controller --to kvm:5
juju add-unit contrail-analytics --to kvm:5
juju add-unit contrail-analyticsdb --to kvm:5
juju add-unit heat --to kvm:5
juju add-unit neutron-api --to kvm:5
juju add-unit ubuntu --to 5
juju add-unit placement --to kvm:5
juju add-unit contrail-haproxy --to lxd:5
juju add-unit contrail-keystone-auth --to lxd:5
juju add-unit glance --to lxd:5
juju add-unit keystone --to lxd:5
juju add-unit memcached --to lxd:5
juju add-unit nova-cloud-controller --to lxd:5
juju add-unit openstack-dashboard --to lxd:5
juju add-unit rabbitmq-server --to lxd:5

4]. Initialize the mysql cluster by choosing the correct unit to become the bootstrap node. Run the bootstrap-pxc action on the node with the highest sequence number(marked with *) and notify the others in the cluster.

juju run-action --wait mysql/3 bootstrap-pxc

Wait until execution completes.

juju run-action mysql/2 notify-bootstrapped –wait