time warp bug - dianaclarke/openstack-notes GitHub Wiki

Time Warp Bug

$ date
Tue Oct 13 15:10:19 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-13 20:12:21
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-13 20:12:24
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-13 20:12:27
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-13 20:12:28
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-13 20:12:20
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-13 20:12:20
  • Change the date to the future (Note the State column is :-) for all services)
$ date
Tue Oct 13 17:23:36 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-13 22:25:01
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-13 22:25:01
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-13 22:25:01
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-13 22:25:01
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-13 22:25:01
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-13 22:25:01
  • Change the date back (Note the State column is XXX for all services)
$ date
Tue Oct 13 15:28:25 CDT 2015
$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-13 22:28:01
nova-cert        localhost.localdomain                internal         enabled    XXX   2015-10-13 22:28:01
nova-network     localhost.localdomain                internal         enabled    XXX   2015-10-13 22:28:01
nova-compute     localhost.localdomain                nova             enabled    XXX   2015-10-13 22:28:01
nova-scheduler   localhost.localdomain                internal         enabled    XXX   2015-10-13 22:28:01
nova-consoleauth localhost.localdomain                internal         enabled    XXX   2015-10-13 22:28:01

2015-10-13 15:28:51.023 DEBUG nova.servicegroup.drivers.db [req-3d8f32c3-baf3-46fc-a572-bd286bd16e84 None None] Seems service is down. Last heartbeat was 2015-10-13 22:28:01. Elapsed time is -7149.976574 from (pid=29718) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:75
  • Does this bug happen on nova master? Yes, it looks like it.
----------> clean devstack (master), nova at commit e55130a8d6eb348b004d0c1ee0b39b94c2e0b4b5

$ date
Wed Oct 14 17:27:23 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-14 22:27:36
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-14 22:27:29
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-14 22:27:33
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-14 22:27:35
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-14 22:27:31
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-14 22:27:36

----------> move clock ahead 2 hours

$ date
Wed Oct 14 19:28:08 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-15 00:28:21
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-15 00:28:21
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-15 00:28:21
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-15 00:28:20
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-15 00:28:21
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-15 00:28:21

----------> move clock back to the actual time

$ date
Wed Oct 14 17:29:54 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-cert        localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-network     localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-scheduler   localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:50
nova-consoleauth localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-compute     localhost.localdomain                nova             enabled    XXX   2015-10-15 00:28:51

----------> Note that `Status` is `XXX` for all services above

$ date
Wed Oct 14 17:39:25 CDT 2015

$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-cert        localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-network     localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-scheduler   localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:50
nova-consoleauth localhost.localdomain                internal         enabled    XXX   2015-10-15 00:28:51
nova-compute     localhost.localdomain                nova             enabled    XXX   2015-10-15 00:28:51

----------> Also note that the heartbeats have stopped altogether (see `Updated_At` above) 
  • The following patch appears to fix most of the services.
diff --git a/nova/openstack/common/loopingcall.py b/nova/openstack/common/loopingcall.py
index 82411af..9369cb2 100644
--- a/nova/openstack/common/loopingcall.py
+++ b/nova/openstack/common/loopingcall.py
@@ -16,21 +16,16 @@
 #    under the License.
 
 import sys
-import time
 
 from eventlet import event
 from eventlet import greenthread
+from oslo_utils import timeutils
 
 from nova.openstack.common.gettextutils import _LE, _LW
 from nova.openstack.common import log as logging
 
 LOG = logging.getLogger(__name__)
 
-# NOTE(zyluo): This lambda function was declared to avoid mocking collisions
-#              with time.time() called in the standard logging module
-#              during unittests.
-_ts = lambda: time.time()
-
 
 class LoopingCallDone(Exception):
     """Exception to break out and stop a LoopingCallBase.
@@ -76,13 +71,15 @@ class FixedIntervalLoopingCall(LoopingCallBase):
                 greenthread.sleep(initial_delay)
 
             try:
+                watch = timeutils.StopWatch()
                 while self._running:
-                    start = _ts()
+                    start = watch.restart()
                     self.f(*self.args, **self.kw)
-                    end = _ts()
+                    watch.stop()
                     if not self._running:
                         break
-                    delay = end - start - interval
+                    elapsed = watch.elapsed()
+                    delay = elapsed - interval
                     if delay > 0:
                         LOG.warn(_LW('task %(func_name)s run outlasted '
                                      'interval by %(delay).2f sec'),
diff --git a/nova/openstack/common/periodic_task.py b/nova/openstack/common/periodic_task.py
index ebbb98b..fff5c5c 100644
--- a/nova/openstack/common/periodic_task.py
+++ b/nova/openstack/common/periodic_task.py
@@ -14,6 +14,7 @@
 import random
 import time
 
+from monotonic import monotonic as now
 from oslo.config import cfg
 import six
 
@@ -77,7 +78,7 @@ def periodic_task(*args, **kwargs):
         if f._periodic_immediate:
             f._periodic_last_run = None
         else:
-            f._periodic_last_run = time.time()
+            f._periodic_last_run = now()
         return f
 
     # NOTE(sirp): The `if` is necessary to allow the decorator to be used with
@@ -154,7 +155,7 @@ def _nearest_boundary(last_run, spacing):
     do not synchronize. This jitter is rounded to the nearest second, this
     means that spacings smaller than 20 seconds will not have jitter.
     """
-    current_time = time.time()
+    current_time = now()
     if last_run is None:
         return current_time
     delta = current_time - last_run
@@ -184,7 +185,7 @@ class PeriodicTasks(object):
             # Check if due, if not skip
             idle_for = min(idle_for, spacing)
             if last_run is not None:
-                delta = last_run + spacing - time.time()
+                delta = last_run + spacing - now()
                 if delta > 0:
                     idle_for = min(idle_for, delta)
                     continue
diff --git a/nova/service.py b/nova/service.py
index 5f851fe..8826c0c 100644
--- a/nova/service.py
+++ b/nova/service.py
@@ -21,6 +21,8 @@ import os
 import random
 import sys
 
+import eventlet
+import monotonic
 from oslo.config import cfg
 from oslo import messaging
 
@@ -132,6 +134,12 @@ class Service(service.Service):
                  periodic_interval_max=None, db_allowed=True,
                  *args, **kwargs):
         super(Service, self).__init__()
+
+        # bz #1217389: use a monotonic clock for the eventlet event loop
+        # to not hang when the system clock is modified (ex: NTP update)
+        hub = eventlet.hubs.get_hub()
+        hub.clock = monotonic.monotonic
+
         self.host = host
         self.binary = binary
         self.topic = topic
diff --git a/requirements.txt b/requirements.txt
index d9f458d..c0749cc 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -45,3 +45,5 @@ lockfile<=0.8,>=0.8
 simplejson<=3.3.1,>=2.2.0
 rfc3986<=0.2.0,>=0.2.0 # Apache-2.0
 oslo.vmware<0.9.0,>=0.6.0 # Apache-2.0
+monotonic>=0.4
+oslo.utils>=1.6.0 # Apache-2.0
  • Here's me trying to reproduce it, post-patch:
[diana@localhost devstack]$ date
Fri Oct 16 14:57:57 CDT 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-16 19:58:00
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-16 19:58:03
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-16 19:57:56
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-16 19:57:57
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-16 19:58:00
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-16 19:57:59

[diana@localhost devstack]$ date
Fri Oct 16 16:58:49 CDT 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-16 21:58:59
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-16 21:58:57
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-16 21:58:50
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-16 21:58:51
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-16 21:58:54
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-16 21:58:53

[diana@localhost devstack]$ date
Fri Oct 16 15:00:08 CDT 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-16 21:59:39
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-16 20:00:03
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-16 20:00:06
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-16 20:00:07
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-16 20:00:10
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-16 20:00:09

[diana@localhost devstack]$ date
Fri Oct 16 15:01:08 CDT 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-16 21:59:39
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-16 20:01:03
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-16 20:01:06
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-16 20:01:07
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-16 20:01:10
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-16 20:01:09
[diana@localhost devstack]$ date
Sat Oct 17 02:55:45 IST 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-17 01:55:56
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-17 01:55:59
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-17 01:56:02
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-17 01:55:53
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-17 01:55:55
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-17 01:55:55

[diana@localhost devstack]$ date
Sat Oct 17 04:57:37 IST 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    :-)   2015-10-17 03:57:40
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-17 03:57:34
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-17 03:57:37
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-17 03:57:38
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-17 03:57:40
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-17 03:57:40

[diana@localhost devstack]$ date
Sat Oct 17 04:58:02 IST 2015

[diana@localhost devstack]$ date
Sat Oct 17 02:58:09 IST 2015

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-17 03:58:00
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:09
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:12
nova-compute     localhost.localdomain                nova             enabled    XXX   2015-10-17 03:57:58
nova-scheduler   localhost.localdomain                internal         enabled    XXX   2015-10-17 03:58:00
nova-consoleauth localhost.localdomain                internal         enabled    XXX   2015-10-17 03:58:00

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-17 03:58:00
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:09
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:12
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-17 01:58:13
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:15
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:15

[diana@localhost devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-17 03:58:00
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:19
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:12
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-17 01:58:13
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:15
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-17 01:58:15
2015-10-17 03:03:15.111 DEBUG nova.servicegroup.drivers.db [req-99ffe846-0438-434c-84b2-e6263075495e None None] Seems service is down. Last heartbeat was 2015-10-17 03:58:00. Elapsed time is -6884.888095 from (pid=13288) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:75
diff --git a/nova/openstack/common/periodic_task.py b/nova/openstack/common/periodic_task.py
index ebbb98b..fff5c5c 100644
--- a/nova/openstack/common/periodic_task.py
+++ b/nova/openstack/common/periodic_task.py
@@ -14,6 +14,7 @@
 import random
 import time
 
+from monotonic import monotonic as now
 from oslo.config import cfg
 import six
 
@@ -77,7 +78,7 @@ def periodic_task(*args, **kwargs):
         if f._periodic_immediate:
             f._periodic_last_run = None
         else:
-            f._periodic_last_run = time.time()
+            f._periodic_last_run = now()
         return f
 
     # NOTE(sirp): The `if` is necessary to allow the decorator to be used with
@@ -154,7 +155,7 @@ def _nearest_boundary(last_run, spacing):
     do not synchronize. This jitter is rounded to the nearest second, this
     means that spacings smaller than 20 seconds will not have jitter.
     """
-    current_time = time.time()
+    current_time = now()
     if last_run is None:
         return current_time
     delta = current_time - last_run
@@ -184,7 +185,7 @@ class PeriodicTasks(object):
             # Check if due, if not skip
             idle_for = min(idle_for, spacing)
             if last_run is not None:
-                delta = last_run + spacing - time.time()
+                delta = last_run + spacing - now()
                 if delta > 0:
                     idle_for = min(idle_for, delta)
                     continue
diff --git a/nova/service.py b/nova/service.py
index 5f851fe..8826c0c 100644
--- a/nova/service.py
+++ b/nova/service.py
@@ -21,6 +21,8 @@ import os
 import random
 import sys
 
+import eventlet
+import monotonic
 from oslo.config import cfg
 from oslo import messaging
 
@@ -132,6 +134,12 @@ class Service(service.Service):
                  periodic_interval_max=None, db_allowed=True,
                  *args, **kwargs):
         super(Service, self).__init__()
+
+        # bz #1217389: use a monotonic clock for the eventlet event loop
+        # to not hang when the system clock is modified (ex: NTP update)
+        hub = eventlet.hubs.get_hub()
+        hub.clock = monotonic.monotonic
+
         self.host = host
         self.binary = binary
         self.topic = topic
diff --git a/requirements.txt b/requirements.txt
index d9f458d..866b7a0 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -45,3 +45,4 @@ lockfile<=0.8,>=0.8
 simplejson<=3.3.1,>=2.2.0
 rfc3986<=0.2.0,>=0.2.0 # Apache-2.0
 oslo.vmware<0.9.0,>=0.6.0 # Apache-2.0
+monotonic>=0.4
  • Query logs, database rows, service list
2015-10-20 13:34:14.910 INFO sqlalchemy.engine.base.Engine [req-4be85c99-4f8c-4dfb-8162-4932
77649116 None None] SELECT services.deleted_at AS services_deleted_at, services.deleted AS s
ervices_deleted, services.created_at AS services_created_at, services.updated_at AS services
_updated_at, services.id AS services_id, services.host AS services_host, services.`binary` A
S services_binary, services.topic AS services_topic, services.report_count AS services_repor
t_count, services.disabled AS services_disabled, services.disabled_reason AS services_disabl
ed_reason 
FROM services 
WHERE services.deleted = %s AND services.id = %s 
 LIMIT %s
2015-10-20 13:34:14.910 INFO sqlalchemy.engine.base.Engine [req-4be85c99-4f8c-4dfb-8162-4932
77649116 None None] (0, 4, 1)
2015-10-20 13:34:14.912 INFO sqlalchemy.engine.base.Engine [req-4be85c99-4f8c-4dfb-8162-4932
77649116 None None] UPDATE services SET updated_at=%s, report_count=%s WHERE services.id = %
s
2015-10-20 13:34:14.913 INFO sqlalchemy.engine.base.Engine [req-4be85c99-4f8c-4dfb-8162-4932
77649116 None None] (datetime.datetime(2015, 10, 20, 18, 34, 14, 912251), 6828, 4L)
2015-10-20 13:34:14.915 INFO sqlalchemy.engine.base.Engine [req-4be85c99-4f8c-4dfb-8162-4932
77649116 None None] COMMIT
$ nova-manage service list
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-20 18:40:12
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-20 18:34:55
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-20 18:34:33
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-20 18:34:24
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-20 18:34:56
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-20 18:35:05
MariaDB [nova]> select * from services order by id\G
*************************** 1. row ***************************
     created_at: 2015-10-19 23:19:40
     updated_at: 2015-10-20 18:40:12
     deleted_at: NULL
             id: 1
           host: localhost.localdomain
         binary: nova-conductor
          topic: conductor
   report_count: 6065
       disabled: 0
        deleted: 0
disabled_reason: NULL
*************************** 2. row ***************************
     created_at: 2015-10-19 23:19:43
     updated_at: 2015-10-20 18:37:15
     deleted_at: NULL
             id: 2
           host: localhost.localdomain
         binary: nova-cert
          topic: cert
   report_count: 6903
       disabled: 0
        deleted: 0
disabled_reason: NULL
*************************** 3. row ***************************
     created_at: 2015-10-19 23:19:46
     updated_at: 2015-10-20 18:34:33
     deleted_at: NULL
             id: 3
           host: localhost.localdomain
         binary: nova-network
          topic: network
   report_count: 6830
       disabled: 0
        deleted: 0
disabled_reason: NULL
*************************** 4. row ***************************
     created_at: 2015-10-19 23:19:47
     updated_at: 2015-10-20 18:34:24
     deleted_at: NULL
             id: 4
           host: localhost.localdomain
         binary: nova-compute
          topic: compute
   report_count: 6829
       disabled: 0
        deleted: 0
disabled_reason: NULL
*************************** 5. row ***************************
     created_at: 2015-10-19 23:19:49
     updated_at: 2015-10-20 18:37:16
     deleted_at: NULL
             id: 5
           host: localhost.localdomain
         binary: nova-scheduler
          topic: scheduler
   report_count: 6945
       disabled: 0
        deleted: 0
disabled_reason: NULL
*************************** 6. row ***************************
     created_at: 2015-10-19 23:19:58
     updated_at: 2015-10-20 18:37:15
     deleted_at: NULL
             id: 6
           host: localhost.localdomain
         binary: nova-consoleauth
          topic: consoleauth
   report_count: 6944
       disabled: 0
        deleted: 0
disabled_reason: NULL
6 rows in set (0.00 sec)
  • Restart so that all services are in a good state. Are the logs from service id 1 different at all?
2015-10-20 13:55:15.221 INFO sqlalchemy.engine.base.Engine [-] SELECT services.deleted_at AS
 services_deleted_at, services.deleted AS services_deleted, services.created_at AS services_
created_at, services.updated_at AS services_updated_at, services.id AS services_id, services
.host AS services_host, services.`binary` AS services_binary, services.topic AS services_top
ic, services.report_count AS services_report_count, services.disabled AS services_disabled, 
services.disabled_reason AS services_disabled_reason 
FROM services 
WHERE services.deleted = %s AND services.id = %s 
 LIMIT %s
2015-10-20 13:55:15.221 INFO sqlalchemy.engine.base.Engine [-] (0, 1L, 1)
2015-10-20 13:55:15.224 INFO sqlalchemy.engine.base.Engine [-] UPDATE services SET updated_a
t=%s, report_count=%s WHERE services.id = %s
2015-10-20 13:55:15.224 INFO sqlalchemy.engine.base.Engine [-] (datetime.datetime(2015, 10, 
20, 18, 55, 15, 224134), 6089L, 1L)
2015-10-20 13:55:15.226 INFO sqlalchemy.engine.base.Engine [-] COMMIT
2015-10-20 13:55:14.965 INFO sqlalchemy.engine.base.Engine [req-700cda20-4939-4ecd-a17a-1793
e53e88aa None None] SELECT services.deleted_at AS services_deleted_at, services.deleted AS s
ervices_deleted, services.created_at AS services_created_at, services.updated_at AS services
_updated_at, services.id AS services_id, services.host AS services_host, services.`binary` A
S services_binary, services.topic AS services_topic, services.report_count AS services_repor
t_count, services.disabled AS services_disabled, services.disabled_reason AS services_disabl
ed_reason 
FROM services 
WHERE services.deleted = %s AND services.id = %s 
 LIMIT %s
2015-10-20 13:55:14.966 INFO sqlalchemy.engine.base.Engine [req-700cda20-4939-4ecd-a17a-1793
e53e88aa None None] (0, 4, 1)
2015-10-20 13:55:14.971 INFO sqlalchemy.engine.base.Engine [req-700cda20-4939-4ecd-a17a-1793
e53e88aa None None] UPDATE services SET updated_at=%s, report_count=%s WHERE services.id = %
s
2015-10-20 13:55:14.971 INFO sqlalchemy.engine.base.Engine [req-700cda20-4939-4ecd-a17a-1793
e53e88aa None None] (datetime.datetime(2015, 10, 20, 18, 55, 14, 971237), 6895, 4L)
2015-10-20 13:55:14.972 INFO sqlalchemy.engine.base.Engine [req-700cda20-4939-4ecd-a17a-1793
e53e88aa None None] COMMIT
  • Perhaps the issues with nova-conductor are fixed in master. The logs above are from Juno. Same issue in Liberty with a similar patch? Answer: Yup, also in Liberty (even with eventlet patch).
[diana@reserved devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   reserved                             internal         enabled    :-)   2015-10-20 21:00:33
nova-cert        reserved                             internal         enabled    :-)   2015-10-20 21:00:30
nova-network     reserved                             internal         enabled    :-)   2015-10-20 21:00:34
nova-scheduler   reserved                             internal         enabled    :-)   2015-10-20 21:00:38
nova-consoleauth reserved                             internal         enabled    :-)   2015-10-20 21:00:34
nova-compute     reserved                             nova             enabled    :-)   2015-10-20 21:00:38

[diana@reserved devstack]$ date
Tue Oct 20 16:00:43 CDT 2015

[diana@reserved devstack]$ date
Tue Oct 20 16:10:29 CDT 2015

[diana@reserved devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   reserved                             internal         enabled    :-)   2015-10-20 21:10:31
nova-cert        reserved                             internal         enabled    :-)   2015-10-20 21:10:34
nova-network     reserved                             internal         enabled    :-)   2015-10-20 21:10:28
nova-scheduler   reserved                             internal         enabled    :-)   2015-10-20 21:10:32
nova-consoleauth reserved                             internal         enabled    :-)   2015-10-20 21:10:28
nova-compute     reserved                             nova             enabled    :-)   2015-10-20 21:10:32

[diana@reserved devstack]$ date
Tue Oct 20 16:01:58 CDT 2015

[diana@reserved devstack]$ nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-conductor   reserved                             internal         enabled    XXX   2015-10-20 21:10:41
nova-cert        reserved                             internal         enabled    :-)   2015-10-20 21:02:00
nova-network     reserved                             internal         enabled    :-)   2015-10-20 21:01:54
nova-scheduler   reserved                             internal         enabled    :-)   2015-10-20 21:01:58
nova-consoleauth reserved                             internal         enabled    :-)   2015-10-20 21:02:04
nova-compute     reserved                             nova             enabled    :-)   2015-10-20 21:01:58
  • Relevant snippets of code
class DbDriver(base.Driver):

    def join(self, member, group, service=None):
        LOG.debug('DB_Driver: join new ServiceGroup member %(member)s to '
                  'the %(group)s group, service = %(service)s',
                  {'member': member, 'group': group,
                   'service': service})
        if service is None:
            raise RuntimeError(_('service is a mandatory argument for DB based'
                                 ' ServiceGroup driver'))
        report_interval = service.report_interval
        if report_interval:
            service.tg.add_timer(report_interval, self._report_state,
                                 api.INITIAL_REPORTING_DELAY, service)

    def is_up(self, service_ref):
        """Moved from nova.utils
        Check whether a service is up based on last heartbeat.
        """
        # Keep checking 'updated_at' if 'last_seen_up' isn't set.
        # Should be able to use only 'last_seen_up' in the M release
        last_heartbeat = (service_ref.get('last_seen_up') or
            service_ref['updated_at'] or service_ref['created_at'])
        if isinstance(last_heartbeat, six.string_types):
            # NOTE(russellb) If this service_ref came in over rpc via
            # conductor, then the timestamp will be a string and needs to be
            # converted back to a datetime.
            last_heartbeat = timeutils.parse_strtime(last_heartbeat)
        else:
            # Objects have proper UTC timezones, but the timeutils comparison
            # below does not (and will fail)
            last_heartbeat = last_heartbeat.replace(tzinfo=None)
        # Timestamps in DB are UTC.
        elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
        is_up = abs(elapsed) <= self.service_down_time
        if not is_up:
            LOG.debug('Seems service is down. Last heartbeat was %(lhb)s. '
                      'Elapsed time is %(el)s',
                      {'lhb': str(last_heartbeat), 'el': str(elapsed)})
        return is_up

    def _report_state(self, service):
        """Update the state of this service in the datastore."""
        try:
            service.service_ref.report_count += 1
            service.service_ref.save()

            if getattr(service, 'model_disconnected', False):
                service.model_disconnected = False
                LOG.info(
                    _LI('Recovered connection to nova-conductor '
                        'for reporting service status.'))

        # because we are communicating over conductor, a failure to
        # connect is going to be a messaging failure, not a db error.
        except messaging.MessagingTimeout:
            if not getattr(service, 'model_disconnected', False):
                service.model_disconnected = True
                LOG.warn(_LW('Lost connection to nova-conductor '
                             'for reporting service status.'))
@oslo_db_api.wrap_db_retry(max_retries=5, retry_on_deadlock=True)
def service_update(context, service_id, values):
    session = get_session()
    with session.begin():
        service_ref = _service_get(context, service_id, session=session)
        # Only servicegroup.drivers.db.DbDriver._report_state() updates
        # 'report_count', so if that value changes then store the timestamp
        # as the last time we got a state report.
        if 'report_count' in values:
            if values['report_count'] > service_ref.report_count:
                service_ref.last_seen_up = timeutils.utcnow()
        service_ref.update(values)

    return service_ref
    def add_timer(self, interval, callback, initial_delay=None,
                  *args, **kwargs):
        pulse = loopingcall.FixedIntervalLoopingCall(callback, *args, **kwargs)
        pulse.start(interval=interval,
                    initial_delay=initial_delay)
        self.timers.append(pulse)
        return pulse
  • Database during an XXX
MariaDB [nova]> select * from services order by id asc\G
*************************** 1. row ***************************
     created_at: 2015-10-21 18:11:54
     updated_at: 2015-10-21 22:12:30
     deleted_at: NULL
             id: 1
           host: localhost.localdomain
         binary: nova-conductor
          topic: conductor
   report_count: 1270
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:12:30
    forced_down: 0
        version: 2
*************************** 2. row ***************************
     created_at: 2015-10-21 18:11:57
     updated_at: 2015-10-21 22:08:53
     deleted_at: NULL
             id: 3
           host: localhost.localdomain
         binary: nova-cert
          topic: cert
   report_count: 1401
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:08:53
    forced_down: 0
        version: 2
*************************** 3. row ***************************
     created_at: 2015-10-21 18:12:00
     updated_at: 2015-10-21 22:08:53
     deleted_at: NULL
             id: 4
           host: localhost.localdomain
         binary: nova-network
          topic: network
   report_count: 1361
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:08:53
    forced_down: 0
        version: 2
*************************** 4. row ***************************
     created_at: 2015-10-21 18:12:03
     updated_at: 2015-10-21 22:08:59
     deleted_at: NULL
             id: 5
           host: localhost.localdomain
         binary: nova-scheduler
          topic: scheduler
   report_count: 1395
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:08:59
    forced_down: 0
        version: 2
*************************** 5. row ***************************
     created_at: 2015-10-21 18:12:09
     updated_at: 2015-10-21 22:08:54
     deleted_at: NULL
             id: 6
           host: localhost.localdomain
         binary: nova-consoleauth
          topic: consoleauth
   report_count: 1392
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:08:54
    forced_down: 0
        version: 2
*************************** 6. row ***************************
     created_at: 2015-10-21 18:12:13
     updated_at: 2015-10-21 22:08:53
     deleted_at: NULL
             id: 7
           host: localhost.localdomain
         binary: nova-compute
          topic: compute
   report_count: 1355
       disabled: 0
        deleted: 0
disabled_reason: NULL
   last_seen_up: 2015-10-21 22:08:53
    forced_down: 0
        version: 2
6 rows in set (0.00 sec)

$ nova-manage service list
nova-conductor   localhost.localdomain                internal         enabled    XXX   2015-10-21 22:12:30
nova-cert        localhost.localdomain                internal         enabled    :-)   2015-10-21 22:09:33
nova-network     localhost.localdomain                internal         enabled    :-)   2015-10-21 22:09:33
nova-scheduler   localhost.localdomain                internal         enabled    :-)   2015-10-21 22:09:29
nova-consoleauth localhost.localdomain                internal         enabled    :-)   2015-10-21 22:09:34
nova-compute     localhost.localdomain                nova             enabled    :-)   2015-10-21 22:09:33
2015-10-22 16:38:15.564  service_update(7)
2015-10-22 16:38:20.124  BEFORE (11, '_report_state')
2015-10-22 16:38:20.128  service_update(1)
2015-10-22 16:38:20.138  BEFORE (11, '_report_state')
2015-10-22 16:38:20.141  service_update(1)
2015-10-22 16:38:20.145 11 '_report_state' sleeping for 9.98 seconds
2015-10-22 16:38:20.160 11 '_report_state' sleeping for 9.98 seconds
2015-10-22 16:38:25.551  service_update(4)
2015-10-22 16:38:25.567  service_update(7)
2015-10-22 16:38:30.126  AFTER (11, 9.979305732005741)
2015-10-22 16:38:30.126  BEFORE (12, '_report_state')
2015-10-22 16:38:30.133  service_update(1)
2015-10-22 16:38:30.144  AFTER (11, 9.979391832006513)
2015-10-22 16:38:30.145  BEFORE (12, '_report_state')
2015-10-22 16:38:30.150  service_update(1)
2015-10-22 16:38:30.156 12 '_report_state' sleeping for 9.99 seconds
2015-10-22 16:38:30.157 12 '_report_state' sleeping for 9.97 seconds
2015-10-22 16:35:25.619  service_update(4)
2015-10-22 16:35:25.633  service_update(7)
2015-10-22 16:35:35.620  service_update(4)
2015-10-22 16:35:35.632  service_update(7)

...

- logs truncated from 16:35:35 to 16:38:25
- there are no logs where service_id=1 until the actual time catches up to the future time
- the only logs are service_update calls for service ids 4 & 7


2015-10-22 16:38:25.616  service_update(4)
2015-10-22 16:38:25.629  service_update(7)
2015-10-22 16:38:35.617  service_update(4)
2015-10-22 16:38:35.629  service_update(7)
2015-10-22 16:38:40.127  AFTER (12, 199.90615309600253)     <---- BOOM!
2015-10-22 16:38:40.127  BEFORE (13, '_report_state')
2015-10-22 16:38:40.131  service_update(1)
2015-10-22 16:38:40.145  AFTER (12, 199.92564915500407)     <---- BOOM!
2015-10-22 16:38:40.145  BEFORE (13, '_report_state')
            while self._running:
                foo += 1
                LOG.error('-------------------> BEFORE (%d, %r)', foo, func_name)
               	watch.restart()
                result = func(*self.args, **self.kw)
                watch.stop()
                if not self._running:
                    break
                idle = idle_for_func(result, watch.elapsed())
                LOG.error('%(foo)s %(func_name)r sleeping '
                          'for %(idle).02f seconds',
                          {'func_name': func_name, 'idle': idle,
                           'foo': foo})
                watch.restart()
                greenthread.sleep(idle)
                watch.stop()
                LOG.error('-------------------> AFTER (%d, %r)', foo, watch.elapsed())
  • Yup, in all the other services hub.clock is monotonic. Only the conductor service is still pointing to time.
nova-scheduler:

2015-10-22 17:09:17.727 ERROR oslo.service.loopingcall
[-] -----> hub.clock: <function monotonic at 0x7fde77199500>

nova-conductor:

2015-10-22 17:12:10.528 ERROR oslo.service.loopingcall
[-] -----> hub.clock: <built-in function time>
⚠️ **GitHub.com Fallback** ⚠️