Overview ‐ Audit Controls - CALC-COLLAB/widgets GitHub Wiki

Audit integrity can be easily maintained in Visual Data Flows by following a few key principles (refer also attached diagram):

  1. Never modify primary, raw data sources
    • This is achieved by having a one way data ingestion from raw data sources (publish to a different location to avoid risk of overwriting)
  2. Synchronise ingestion with work process frequency, eg ingest current month data for a monthly work review
    • This is easily accommodated in connection nodes when using Dataiku
    • In KNIME, using DB Looping or DB Query Reader along with a time period range prevents re-loading a full history of data (which occurs if simple DB Reader node is used)
  3. Calculation nodes may need to be locked or versioned to prevent unexplained changes within a reporting year
    • Calculation nodes can be copied, pasted, edit and easily run in parallel for consistency testing in the same workflow
    • Nodes can be easily locked in paid versions of KNIME and Dataiku
  4. Leverage provisional and published datasets to both monitor data as it’s emerging and protect final datasets
    • Publishing provisional data sets actually helps make final data sets more accurate by supporting trending & error detection
    • Typically published data sets are locked at the end of month and would only be rerun with considered oversite of data owner. Data reruns are easily accommodated in visual data flows without the need for IT Department intervention

image