Data Migration TFRS to LCFS - bcgov/lcfs GitHub Wiki
This document describes the process and tools used for migrating data from the TFRS (Transportation Fuel Reporting System) database to the LCFS (Low Carbon Fuel Standard) application database. This process primarily involves Apache NiFi for data flow orchestration and supporting shell scripts. This is based on the original "12. Data migration from TFRS to LCFS" wiki page.
The ETL (Extract, Transform, Load) subsystem, located in the etl/
directory, is responsible for this data migration. Key components involved:
- Apache NiFi: The core data flow automation tool. NiFi flows are designed to extract data from the TFRS database, perform necessary transformations, and load it into the LCFS database.
-
etl/data-migration.sh
script: A shell script that automates several aspects of the NiFi-based data migration process, including managing NiFi processor states and ensuring database connectivity. -
etl/data-transfer.sh
script: A utility script for transferring database dumps between OpenShift environments and local PostgreSQL containers (useful for setting up test data or diagnosing issues with production-like data). - Source Database: TFRS PostgreSQL database.
- Target Database: LCFS application PostgreSQL database.
This script is a key tool for managing the NiFi data migration flows.
- Automated NiFi Processor Execution: Triggers specific NiFi processors in a defined sequence to run the data migration tasks.
- Environment Targeting: Can be targeted at different environments (dev, test, prod) to manage connections and configurations accordingly.
- Database Connection Management: Potentially updates NiFi controller services for database connections (though the old wiki mentions this, the script itself would need to be reviewed for current implementation details).
-
Port Forwarding: Can establish
oc port-forward
connections to ensure NiFi can reach OpenShift-hosted TFRS and LCFS databases if NiFi is running locally. - Logging & Error Handling: Provides verbose and debug logging options for troubleshooting.
-
Apache NiFi Instance: A running and accessible NiFi instance (local via
etl/docker-compose.yml
or a deployed instance). -
OpenShift CLI (
oc
): Must be installed and logged into the target OpenShift cluster if interacting with OpenShift-hosted databases. -
jq
: Command-line JSON processor, used by the script. -
lsof
: Utility to list open files (used by the script to check port usage).
cd etl
./data-migration.sh [environment] [--debug | --verbose]
-
[environment]
: Specifydev
,test
, orprod
. This determines the OpenShift namespace and potentially other environment-specific configurations used by the script. -
--debug
: Enables the most detailed logging output. -
--verbose
: Enables verbose logging for troubleshooting.
- Environment Setup: Sets variables based on the specified environment.
-
Port Forwarding (if needed): Checks and establishes
oc port-forward
for TFRS and LCFS databases to make them accessible to NiFi (e.g., if NiFi is local and databases are on OpenShift). - Update NiFi Controller Services (if applicable): May update database connection details in NiFi controller services based on the target environment.
-
Execute NiFi Processors: Iterates through a predefined list of NiFi processor IDs, starting them and monitoring their status.
- The script contains a list of processor IDs (e.g.,
ORGANIZATION_PROCESSOR
,USER_PROCESSOR
,TRANSFER_PROCESSOR
). These IDs must match the actual processor IDs in your NiFi flow template.
- The script contains a list of processor IDs (e.g.,
If the NiFi flow template changes and processor IDs are updated, you must update them in etl/data-migration.sh
:
-
Get Processor IDs from NiFi UI:
- Access your NiFi UI (e.g.,
http://localhost:8091/nifi/
). - Right-click on the relevant processor, select "Configure".
- In the "Settings" tab of the configuration dialog, find and copy the "Processor ID".
(Self-referential: an image would go here if this were a real wiki; for now, this is a placeholder. The original wiki mentioned an image.)
- Access your NiFi UI (e.g.,
-
Update
data-migration.sh
: Edit the script and update the readonly variables at the top with the new IDs:# Example from original wiki readonly ORGANIZATION_PROCESSOR="new_org_processor_id_here" readonly USER_PROCESSOR="new_user_processor_id_here" readonly TRANSFER_PROCESSOR="new_transfer_processor_id_here"
Also, ensure the
execute_processor
calls at the end of the script use these variables correctly and are in the desired order.
-
Port Already in Use: If port forwarding fails, use
lsof -ti :<port_number> | xargs kill -9
to kill the process using the port. - Processor ID Not Found: Verify IDs in NiFi UI and in the script.
- NiFi API Errors (HTTP 400/500): Ensure NiFi is running, accessible, and API credentials/URLs used by the script (if any) are correct.
This script is a general utility for moving data between OpenShift PostgreSQL instances and local PostgreSQL containers using pg_dump
.
cd etl
./data-transfer.sh <tfrs|lcfs> <dev|test|prod> <import|export> <local_container_name_or_id>
-
Argument 1:
tfrs
orlcfs
(target/source application database type). -
Argument 2:
dev
,test
, orprod
(OpenShift environment). -
Argument 3:
import
(OpenShift to Local) orexport
(Local to OpenShift). - Argument 4: Name or ID of the local PostgreSQL Docker container.
Example Commands (from original wiki):
./data-transfer.sh lcfs dev export your_local_lcfs_db_container
./data-transfer.sh tfrs prod import your_local_tfrs_db_container
-
oc
CLI installed and logged into the target OpenShift cluster. - Local target/source PostgreSQL Docker container running.
- The actual ETL logic (transformations, specific source/target mappings) is defined within the Apache NiFi flow templates found in
etl/templates/
. - Understanding these flows requires accessing the NiFi UI, uploading the template, and inspecting the processors and their configurations.
- Failed records during NiFi processing are typically logged to
etl/nifi_output/
for inspection.
This guide provides an overview of the data migration scripts and processes. For detailed ETL logic, refer to the NiFi flow templates. Always test migration scripts thoroughly in non-production environments before running against production data.