A Data issues - StatisticalReinforcementLearningLab/HeartstepsV1Code GitHub Wiki

Data issues arise for a variety of reasons. These are outlined below.

Phone turned off or out of coverage

No HeartSteps application data are generated if the phone is turned off or cannot communicate with the server for extended periods of time. This scenario can be characterized by consecutive time slots at which we have no records in any of the EMA- or suggestion-related HeartSteps tables.

Loss of Jawbone and Google Fit data also arises in this situation. However gaps in step count data might also occur if the phone's Bluetooth is disabled or the activity tracker device (i.e. the wristband for Jawbone or phone for Google Fit) is simply not carried by the user when active.

Lack of handshake protocols

HeartSteps did not have handshake protocols to ensure an active connection between the phone and the server. So, for instance, the phone would attempt to transmit data over a WiFi connection and discard the data without verifying that it was received by the server. Such a scenario would arise when the phone was connected to a WiFi network, but was not authorized to send data. In general we can characterize this situation by intermittent time slots at which we have no EMA- or suggestion-related records in the HeartSteps tables.

HeartSteps app killed by the user

User interaction with a HeartSteps notification typically begins by the user swiping down to expand the notification shade. At this point the notification can be interacted with two ways:

  1. If no other app notifications appear in the shade, the user can view the activity suggestion message and provide an answer (thumbs up/down or enable snooze).
  2. The user can touch the expanded notification. This will open the HeartSteps app, which appears as a window to enter thumbs up/down or enable snooze.

If, once an answer is provided via method 2, the user views recent apps and swipes HeartSteps away before HeartSteps can send data to the server, then all data related to this suggestion time slot is lost.

Depending on the user's preference for method 1 or 2, this scenario might amount to persistent or intermittent time slots at which we have no suggestion-related data. Note that this cannot be readily distinguished from the above scenario.

Race conditions

Access to HeartSteps data on the phone's storage was not synchronized via, for example, locks. So write and read operations by HeartSteps processes were not necessarily carried out in the intended order. Such race conditions were generally more common in older phones with lower processing power, but also with certain data elements depending on the structure of the source code. From the information available in the data tables, it is apparent race conditions affected the following:

  • Administered planning status and EMA questions in EMA_Context_Notified (planning_today and ema_set_today, respectively). These two variables generally reflect the initial or previous status, rather than that of the current EMA.
  • Identifier intended to link between Momentary_Decision and Response (decisionID). This variable should be associated with a single, unique suggestion time slot and user. However a given value could be re-used into subsequent time slots of the user.
  • Each suggestion time slot is associated with prefetch data, generated 30 minutes ahead in the event that the user is not connected to the server at the time slot. In the presence of non-prefetch decision data in Momentary_Decision, the prefetch data can generally be discarded. However in a small number of cases, the decision result carried out (as reflected in Response) represents the prefetch. These problems are mitigated in the analysis data frames, which are obtained through extensive processing of the source CSV files.

System locale

HeartSteps app issues were prevalent on phones whose Android locale was set to any language other than some variant of English (e.g. Traditional Chinese). This resulted in both data loss and quality issues. Users associated with non-English locale are readily identified by the invalid timezone variable value "????????"; these subjects are excluded from all but the user-level analysis data frames.