Overview ‐ Data Ingestion Widgets - CALC-COLLAB/widgets GitHub Wiki

Data Ingestion Widgets Introduction

Visual data flow tools (eg KNIME, Dataiku, Alteryx, Apache Hop) come pre-loaded with almost 100's of ways to connect to data sources in the digital eco-system. Thus, their use in ESG calculations can help to overcome the customisation burden than can be felt in initial implementation or ongoing maintenance of excel files and Software as a Service offerings.

This overview pages lists widget instructions, location of examples and links to Calc Collab sub-wiki's for common data ingestion approaches.

General wiki guidance

Please avoid using tables and use headers & sub-headers for nesting information. This is to maximise speed of general users being able to contribute to wiki pages using the generic markdown formatting header shown in Git

Data Ingestion Source 1 - Snowflake

Snowflake general notes

Snowflake is a powerful, paid data warehouse platform that is of global scale. From an ESG perspective it has some key features that are useful.

Where a company doesn't yet have a data warehouse (eg an organised place to store intermediate and final ESG calculation information) it is a cost competitive option, alongside other data warehouse providers
Data sharing. Snowflake allows for secure, controlled data sharing which is actually free to use if data is being shared within the same region. Owners of the data never actually send physical data and lose control of their data sets. In snowflake they instead allow a supply chain collaborator to read the data, giving it a new timestamp and meta data showing that it had been read into the destination system.

The following guidance is for the next step of bringing that data into your visual data flow tool of choice.

KNIME guidance

To successfully ingest Snowflake data the following sequence of IO Nodes can be deployed:
- 'Snowflake Connector' - 'DB Table Selector' - 'DB Reader'
- Note that in the default setup, the snowflake connector will require a user to have access to the relevant Multi-Factor Authentication app used by snowflake, currently in 2025 this is Cisco Duo.
General snowflake connector instructions: Snowflake connector node instructions
Don't use basic account name or account identifier in the node 'Full account name' field
- Use the account URL but remove the 'snowflakecomputing.com' as this is appended anyway
- Make sure you enter the warehouse, not the database name in 'Virtual warehouse'
'Default access control role' considerations
- In Snowflake you can create a user, set default control access and password, which can then be used for all KNIME workflows.
- Make sure this user is assigned a role which has appropriate privileges set on the data to be exposed. Eg by default PUBLIC is given at least USAGE privileges
Note that connection doesn't seem to work if user doesn't have MFA and post Snowflake hacks probably need user to have MFA

Work to be done

Setup automated organisation MFA (via JDBC driver or OAuth?)

Data Ingestion Source 2 - Microsoft Azure Synapse (SQL)

Synapse general notes

Synapse is the MS Azure data warehouse, is SQL based and also links to MS Azure Lakehouses. This guidance likely needs to be split up to accomodate nuance between different Azure data stores but serves as a useful starting point for ingesting organised Azure data sets, like calculation data that would typically live in a data warehouse.

KNIME guidance

To successfully import Synapse data the following sequence of IO Nodes can be deployed:
- 'Microsoft Authenticator' - 'Microsoft SQL Server Connector' - 'DB Table Selector' - 'DB Reader'
To import synapse data the JDBC driver for microsoft needs to be installed in the KNIME instance being run
- Navigate to the extensions menu, filter = driver
Knime documentation links
- Connecting to MS SQL Pools
- DB Plugin

Work to be done

Setup automated organisation MFA (via JDBC driver or OAuth?)

Data Ingestion Source 3 - Manual Entry Table

Manual Entry general notes

Manual entry tables are still critical for ESG work flows for several reasons:

They can be required for logging data overrides or audit checks
They are useful to store hand calculated results needed for calculations and owned by ESG team(s)
They can be a great staging area for testing calculations before later automation
They can provide a way for users to control visual data flow automation

Knime guidance

Use 'Table Creator' node from the IO library
Really no tricks to using this one other than make sure to specify the right data type for columns as that can impact whether nodes downstream work
- eg Numbers won't work on string/text operations and vice versa