RHoSP Overlay VMs stuck in 'BUILD' state - ganeshahv/Contrail_SRE GitHub Wiki

PROBLEM

In a RHoSP setup, overlay VMs are stuck in 'initializing' state. The following logs are seen on the nova-compute.log

2020-05-28 00:02:17.047 8 ERROR oslo.messaging._drivers.impl_rabbit [-] [54a1c68d-7c0f-4e6e-a749-cb8f2d2653e6
AMQP server on overcloud-controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED.
Trying again in 1 seconds.: error: [Errno 111] ECONNREFUSED

From the undercloud node, I see that the overcloud hypervisors are in down state:

(overcloud) [stack@queensa log]$ openstack hypervisor list
+----+--------------------------------------+-----------------+-----------+-------+
| ID | Hypervisor Hostname                  | Hypervisor Type | Host IP   | State |
+----+--------------------------------------+-----------------+-----------+-------+
|  2 | overcloud-contraildpdk-0.localdomain | QEMU            | 10.1.0.17 | down  |
|  5 | overcloud-contraildpdk-1.localdomain | QEMU            | 10.1.0.21 | down  |
+----+--------------------------------------+-----------------+-----------+-------+

The nova-compute docker on the DPDK computes show unhealthy:

[root@overcloud-contraildpdk-0 ~]# sudo docker ps -a | grep health
ffac030043c2        192.168.24.1:8787/rhosp13/openstack-nova-compute:13.0-136.1589310308           "dumb-init --singl..."   2 days ago          Up 55 minutes (unhealthy)                       nova_compute
e4a1efcf71cc        192.168.24.1:8787/rhosp13/openstack-nova-compute:13.0-136.1589310308           "dumb-init --singl..."   2 days ago          Up 2 days (healthy)                             nova_migration_target
ad537a34deb8        192.168.24.1:8787/rhosp13/openstack-iscsid:13.0-115                            "dumb-init --singl..."   2 days ago          Up 2 days (healthy)                             iscsid

SOLUTION

Check the rabbitmq-bundle docker on the openstack controllers. If it is down, start the docker on all the openstack controllers

[root@overcloud-controller-0 ~]# docker ps | grep rabbit
0f14a1c51b97 192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest "dumb-init --singl..." 2 weeks ago Up 2 weeks rabbitmq-bundle-docker-0