Data Model - NWFWMD-IT/Hydrologic-Monitoring-Project GitHub Wiki

Field Workflows and Logical Model

Locations

Locations (also colloquially called "sites") are places where hydrologic monitoring activities take place.

Assets

Assets are the equipment that is, or can be, deployed to a Location. Asset information is managed by office staff for reference in the field; at present, there are no field workflows that modify asset data, and apps should disallow such modifications to protect the integrity of this content.

For the purposes of this model, staff's equipment (e.g. thermometers, steel tapes) are not considered assets. Some information about this equipment may be recorded when used during a Location Visit (e.g. serial number) but the equipment acquisition, maintenance, disposal, etc. is not formally managed by this system.

Intended and Actual Assets

Each Location is established for monitoring one or more hydrologic variables, such as groundwater or rainfall level. Some of this information is collected by a Location asset (e.g. Sensor attached to a Data Logger); other times, information is collected manually by staff with their own equipment (e.g. tape measure).

For Locations with assets, the intended and actual configuration for the Location may differ. For example, a Location may be designed to use a Data Logger, and a logger may have been installed at that Location. During a subsequent visit, staff may find that the logger has been stolen. Therefore, the model needs to accommodate two types of information about a Location's assets: it's intended configuration, and it's actual configuration.

Implementation

The Location object has properties indicating whether it is intended to have a certain asset present (or to support a certain activity). Physically, these properties are stored in the Location table, in columns named with the prefix Has (e.g. HasDataLogger).

Relationships between Location and asset objects indicate which assets are actually deployed to which Locations. Ideally, for each asset-related Has property, one or more relationships to actual asset records will exist. In cases like the stolen Data Logger example above, however, the Location would correctly indicate that it is supposed to have a Data Logger, and the absence of a related Data Logger record would indicate that the logger is not actually deployed at present.

Data Logger / ADVM Data Files

Time Ranges

The time ranges that each data file covers is recorded within the file itself, per the earliest and latest time series records it contains. Nonetheless, the Location workflow requires in several places that field staff explicitly record the start/end times for data files. The primary goal of this is to prompt the user to open the file and verify that it downloaded completely and correctly. If the user determines that the file is incomplete or corrupt, they can re-download the file immediately, thus saving a follow-up visit and preventing possible data loss.

Batteries

Data loggers (including ADVM) are powered by batteries. Some Locations have solar panels to charge the batteries in place. Other stations do not have charging on site, so staff must replace batteries periodically to maintain adequate charge for powering the equipment. Typically, batteries found to have low voltage are replaced with a charged one, then returned to the office to be recharged and reused in the field.

Eventually, all batteries will wear out, and need to be decommissioned and replaced with a new unit. Additionally, batteries may suffer other types of issues that will require them to be serviced or replaced (e.g. corrosion, theft). Therefore, assessing a battery's state and replacing it periodically is necessary maintenance for each Location that contains one. Likewise, capturing records of battery assessments and replacements is important for helping managers maintain the District's battery inventory.

Replacement

Batteries found during a Location Visit to have low voltage or other physical issues (e.g. corrosion) are usually replaced with a healthy, charged battery during that same visit. Battery replacement typically happens at the end of a Location Visit - specifically, after records are downloaded from the data logger(s) it is powering. Replacing a battery requires disconnecting/reconnecting power from its data logger which, in rare cases, can cause the loss of data that has accumulated since the last Location Visit. Therefore, standard procedure is to download data logger records first, then replace the battery.

In some cases, staff may decide not to replace a low battery. For example, a Location may have a solar panel but has had recent cloudy weather that prevented the battery from charging. In this case, staff may choose to leave the existing battery in place with the expectation that it is healthy and will recharge when the weather clears. Other times, staff do not have a charged replacement battery available.

Dead-on-Arrival (DOA)

A battery is DOA when it is found at the start of a Location Visit to have insufficient charge to power its equipment. A DOA battery must be replaced with a charged one early in the visit, instead of at the end, in order to access the data logger.

Once a battery has been fully discharged, some permanent damage has occurred. Depending on factors such as the battery chemistry and charge history, that battery may still have a useful service life; or, the battery may need to be decommissioned and replaced with a new one. Therefore, it is particularly important for managers to know when batteries are found to be DOA so they can include that information in decisions regarding the District's battery inventory.

Data Logger Time Adjustment

Each data logger has an internal clock, which it uses for timestamping measurements when building its time series. Two conditions can cause the local clock time to become out-of-sync with the actual real-world time:

  • Drift
  • Dead battery

The data logger clocks will naturally drift over time, causing them to lag or lead the real-world time. Periodically, staff must update the logger clock to the correct time during a Location Visit. When a clock is found to be more than one minute out-of-sync, an adjustment is necessary. When adjusting the clock, it is important to record the:

  • Adjustment amount (minutes)
  • Adjustment timestamp

so the time series timestamps can be aligned during back office processing.

The clock can also become out-of-sync when the data logger's battery dies. Loggers include small internal batteries that are intended to preserve the clock time when the primary external battery dies. In some cases, however, the internal battery is also dead, or the clock-preserving function of the data logger does not work as intended.

In these cases, the data logger may reset its clock to a default time in the distant past (years). Here, staff will need to update the clock to the correct time and record the fact that the time needed to be adjusted from its default. Recording this event serves to notify asset managers that the internal battery needs to be replaced, or that another problem may exist with the logger.

When making a large adjustment from a logger's default time, however, it is not necessary to record the amount of time that the clock was adjusted, only that an adjustment was necessary; this large adjustment amount does not represent drift and, therefore, is not useful for aligning time series data. Instead, any information about drift would have been lost when the battery died; there is no way to know how closely the logger clock aligned with real-world time prior to the battery failure, once the battery has died.

Manual and Real-Time Measurements

Manual Measurements

Staff collect manual measurements for four types of parameters:

  • Stage
  • Groundwater
  • Conductivity
  • Temperature

These manual measurements can serve two purposes:

  • As part of the measurement history for that Location
  • For calibrating a data logger

Locations without data loggers rely on manual measurements exclusively as the historical record. Locations with data loggers can supplement the logger's time series with manual measurements.

When a data logger is present, staff also use manual measurements to calibrate the logger. Comparing the real-time value reported by the logger to a manual measurement taken at the same time allows determining whether the logger is reporting values within the specified tolerance (which differs by parameter). If the logger value is out-of-spec, staff can adjust the logger to match the corresponding manual measurement. Alternatively, if the sensor type does not support calibration, it can be flagged for replacement during a future visit.

Real-Time Data Logger Measurements

In contrast to manual measurements, staff only take real-time data logger readings for one reason: calibrating the data logger. There is no use case in which a user would record just a real-time logger measurement, because that information is already part of the logger's time series.

(Note that there are limited cases where a user would record a real-time measurement without a corresponding manual measurement, though they are not within the scope of activities for which this survey application is used. Therefore, in this context, we can ignore real-time-only measurements.)

Measurement Records and Timestamps

Therefore, when collecting measurements for one of the parameters listed above, survey users will:

  • Always record a manual measurement
  • Optionally record a real-time logger measurement, if calibrating a logger

The manual measurement, plus its optional real-time measurement (and any derivative values, such as a calibration adjustment amount) compose a single measurement record.

Each measurement record has a single timestamp - even if both manual and real-time values are captured. Recall that the goal of collecting a real-time measurement is for calibration purposes only; so, this value must be gathered at the same time as its corresponding manual measurement. It would not make sense, for example, to capture a stage measurement in a tidal area 30 minutes before a real-time measurement and compare them for calibration purposes because the real-world water level would likely have changed during that interval.

Multiple Measurement Records

Staff may capture more than one measurement record for the same parameter during the same Location Visit.

Each of the multiple measurement records may be of manual-only type, or a pair of manual + real-time readings. From a data collection perspective, there is no dependency between the individual records.

Each measurement record will have its own timestamp. Per above, manual + real-time measurement pairs will share a timestamp.

Temperature Sensors

Staff will typically not have replacement temperature sensors on-hand if one is found during a location visit to have failed. Instead, users will flag in the survey that the sensor has failed, and a dashboard or similar reporting system will notify managers that a follow-up visit is required to replace the sensor.

Additionally, temperature sensors cannot be calibrated by NWFWMD (some types can be calibrated by a third-party lab). Therefore, unlike other sensor types, the data model contains no calibration information for temperature sensors.

Physical Model Design Standards

Normalization

Derived Values

In some cases, the system will compute a value automatically for the user. For example, when taking a manual stage measurement, the app will use that measurement and the height of the measuring point to compute the water level. Although the water level is a derived value, we store it explicitly, for convenience.

Storing derived values creates the potential for data inconsistency. For example, standing at a measuring point at 23 feet elevation and measuring down to the water surface a distance of -17 feet would yield an actual water level of 6 feet. Absent some integrity enforcement mechanism, however, an errant app or user could store a water level of 5 feet, -100 feet, or any other value the underlying data type will allow.

In online database applications, we typically mitigate this issue using a view, which computes the derived value from its reference data on-the-fly, or a trigger, which performs a calculation and stores the result at a low level in the technology stack that 1) applies to all users and applications, and 2) is not easy to circumvent, intentionally or inadvertently. Neither are good solutions for this use case, though.

Firstly, our Survey123 application is intended to be used primarily offline. Therefore, we cannot depend on a back-office resource like the database server to calculate derived values. Secondly, using views with geodatabase tables can be problematic, especially when those tables include attachments (which these do).

Therefore, the best option in this environment is to have the Survey123 application compute the derived values and store them in normal table columns. This will enable Survey123 users to work offline, and the resulting value will be available for normal display and analysis use, just like other attributes.

Although this approach balances well the technical advantages and disadvantages in the target environment, GIS administrators and developers should be aware of the following in order to ensure data integrity:

  1. All applications that allow editing a reference attribute upon which a derived value depends must take care to update the derived value when the reference data changes
  2. Editing these reference attributes should be restricted to only applications and workflows that are known to properly maintain their derived values

In practical ArcGIS terms, #1 means that any applications in addition to the initial Survey123 app that allow editing of this source data must include the same logic for updating derived values when their reference data changes. This includes other surveys, ArcGIS Enterprise / Online Web Apps, and future deployments such as ArcGIS Field Maps. It is up to staff, not software, to enforce that new apps respect the same data integrity rules.

Similarly, #2 implies that more rigorous control over write access to this source data is warranted than may be the case for other District data sets. Specifically, write access should not be provided for general ArcGIS Pro (or Desktop) use; edits should only occur though apps designed to maintain derived values. Additionally, write access to web services that publish this source data should restrict edit access to the aforementioned reference attributes, such as through secured services or column-level filters.

Cardinality

EXPLAIN WHEN/WHY WE CREATE SEPARATE TABLES VS WHEN WE STORE PROPERTIES DIRECTLY ON THE LOCATIONVISIT TABLE

Attachment Types

  • Data logger
    • Two formats:
      • CSV
      • Text (fixed width)
        • Not processed into Aquarius
        • Old devices, provided by City
        • Files collected > delivered to City under contract, as backup to real-time data they are already receiving
        • Only data not UTC (these are EST)
  • ADVM
    • All will be attached as zip files
    • FYI: We capture these in at least two separate workflows:
      • "Normal" mode
      • Discharge mode
  • Discharge
    • Attached as zip
      • Includes photos
    • FYI: Types
      • ADCP
      • FlowTracker

Columns

Dates and Times

Throughout the model, "date" implies a timestamp that includes both date and time components. There is no defined business requirement for temporal resolution, though the physical data type used for geodatabase date columns on the target platform (SQL Server 2019, datetime2 datatype) has sufficient precision (100 nanoseconds) for the activities in the hydrology field workflows.

Yes/No and Exception Pairs

Many steps in the hydrology field workflow require staff to confirm whether some action was taken and, if not, to choose the reason why from a picklist. We model this in the geodatabase as a pair of columns: one with the Yes/No domain, and a second with a domain name ending with the suffix Exception. For example:

Column Domain
BatteryReplaced Yes/No
BatteryReplacementException Battery Replacement Exception

The standard is for these columns to appear immediately adjacent to one another, with the Yes/No column coming first.

Tables

LocationVisit

Name Data type Domain Comments
BatteryReplacementTimeAdjustment Text Battery Replacement Time Adjustment See also: Data Logger Time Adjustment
DataLoggerTimeAdjustment Integer Units: Minutes
DESCRIBE SIGNED INTEGER SEMANTICS
Values:
  • NULL: Not checked
  • 0: Checked, no adjustment
  • <>0: Number of minutes adjusted
See also: Data Logger Time Adjustment

RainfallTips

  • Verified 1:M with LocationVisit

Example workflow with multiple tip instances:

  • Clean > tip > check tips
  • Discover issue
  • Change bucket / wires
  • Tip again to verify wires hooked up correctly

StageMeasurement

  • Verified 1:M with LocationVisit
  • Storing manual measurement, real-time recorder value, and sensor adjustment data as a single entry
    • Goal is to capture these together and compare / adjust as a group
    • When taking a single measurement/reading (e.g. extra manual measurement in variable weather, extra real-time reading to confirm sensor adjustment/replacement) just populate the relevant properties and leave the rest blank -MeasureDate is a single value that applies to both the manual measurement and real-time recorder reading
    • Goal is to capture both values as close in time as practical, so that the sensor adjustment can be computed accurately
    • If the measurement/reading are far enough apart in time such that a single MeasureDate is inappropriate, user should ignore those values and re-collect fresh ones
  • The user may adjust the sensor at some non-negligible time after collecting the manual/recorder values from which the adjustment amount is computed
    • Therefore, SensorAdjustmentDate is a separate timestamp, so back-office staff know when the time series starts being affected by the adjustment
Name Data type Domain Comments
WaterLevel Computed automatically (see Derived Values)
SensorAdjustmentAmount Computed automatically (see Derived Values)
Purpose is for app to tell user how much to adjust (not vice versa). It is not a valid use case for the app to recommend one adjustment amount and the user to apply a different adjustment amount, so we only store the one, system-computed value; a flag (SensorAdjusted) indicates whether the user applied that adjustment amount.

TemperatureMeasurement

  • Per the Temperature Sensors section above, temperature sensors cannot be calibrated and, in turn, this table stores no calibration-related data

VisitStaff

  • Verified 1:M with LocationVisit

  • Model staff names as domain

    • Few values
    • Changes infrequently
    • May have issue over time
      • Staff leave, but need to retain their name for historical records
      • But, don't want old names available in current picklist
      • Possibly remove old names from domain?
        • Would leave invalid records in VisitStaff table, but shouldn't cause a problem. Only validated upon request in Desktop, and even then only highlights invalid records. No real enforcement.

Domains

BatteryReplacementTimeAdjustment

Confirms that the user checked or adjusted the data logger system time after replacing its battery.

⚠️ **GitHub.com Fallback** ⚠️