Data Flows - bcgov/lcfs GitHub Wiki
This document describes key data flows within the LCFS system, covering user interactions, API communication, asynchronous processing, and ETL processes. This addresses requirements for ticket #2409.
- User Accesses Frontend: User navigates to the LCFS React frontend application.
-
Redirection to Keycloak: The frontend (using
@react-keycloak/web
) detects the user is not authenticated and redirects the browser to the Keycloak login page. - User Authenticates: User enters credentials (e.g., IDIR, BCEID) on the Keycloak page.
- Token Issuance: Keycloak authenticates the user and issues OIDC tokens (ID Token, Access Token, Refresh Token) back to the frontend via browser redirects.
-
Token Storage: The frontend stores these tokens securely (e.g., in memory,
keycloak-js
handles this). -
API Requests with Access Token: For subsequent requests to the LCFS Backend API, the frontend includes the Access Token (JWT) in the
Authorization: Bearer <token>
header. - Backend Token Validation: The LCFS Backend (FastAPI) receives the request, extracts the JWT, and validates it (signature, expiry, issuer, audience) against Keycloak's public keys.
- Authorization Check: Based on valid token claims (e.g., roles, user ID), the backend performs authorization checks for the requested resource/action.
- Response: Backend processes the request and sends a response to the frontend.
- User Fills Form: User interacts with a form in the React frontend (e.g., creating a compliance report).
- Client-Side Validation: Frontend performs initial validation using libraries like React Hook Form and Yup.
-
API Request: On submission, the frontend (using
axios
and possibly managed by React QueryuseMutation
) sends a POST or PUT request to the appropriate LCFS Backend API endpoint (e.g.,/api/reports
). The request payload contains the form data, and theAuthorization
header includes the JWT. -
Backend Processing:
- Receives the request.
- Validates the JWT (as above).
- Performs authorization.
- Validates the incoming data using Pydantic models.
- Executes business logic (e.g., calculating values, checking rules).
- Interacts with the PostgreSQL database (SQLAlchemy) to create or update records.
- May interact with MinIO for file storage (if attachments are involved).
- May publish a message to RabbitMQ if an asynchronous task needs to be triggered (e.g., generating a PDF version of the report).
- API Response: Backend sends a response (e.g., success message with the new report ID, or error details).
- Frontend Update: Frontend updates the UI based on the API response (e.g., navigates to a success page, displays an error message, React Query updates cache).
- Task Trigger: An LCFS Backend API endpoint, upon a certain action (e.g., report submission, scheduled job), publishes a message to a specific RabbitMQ queue. The message contains necessary data for the task.
- Message Queued: RabbitMQ receives and queues the message.
- Worker Consumption: A dedicated worker process (part of the backend or a separate worker service, TBD by inspecting backend code) consumes messages from the queue.
-
Task Execution: The worker performs the long-running or deferrable task (e.g., generating a complex report, sending an email notification, processing a large dataset).
- This worker may interact with the database, MinIO, or other services.
- Status Update (Optional): The worker might update the status of the task in the database or send a notification (e.g., via WebSocket, email, or another RabbitMQ message) upon completion or failure.
Refer to Data Migration (TFRS to LCFS) and Subsystems and Responsibilities for a detailed overview.
-
NiFi Flow Trigger: The migration process is typically initiated by starting a NiFi data flow (manually via UI or programmatically, potentially using
etl/data-migration.sh
). - Data Extraction: NiFi processors connect to the TFRS PostgreSQL database (source) via JDBC.
-
Data Transformation: Data passes through a series of NiFi processors that perform transformations:
- Data type mapping.
- Schema alignment.
- Value lookups or conversions.
- Data cleansing.
- (Specific transformations are defined in the NiFi flow templates in
etl/templates/
.)
- Data Loading: Transformed data is loaded into the LCFS PostgreSQL database (target) by NiFi processors using JDBC.
-
Error Handling: Records that fail during transformation or loading are routed to an error flow within NiFi and typically logged to files in
etl/nifi_output/
. - Monitoring: NiFi UI provides monitoring of data flow progress, queues, and processor status.
Refer to Caching Strategy.
- API Request: Frontend or another service requests data from a cacheable LCFS Backend API endpoint.
-
Cache Check: The
fastapi-cache2
integration in the backend checks Redis for an existing cached response for this request (based on URL and parameters). - Cache Hit: If a valid (non-expired) cached response exists, it is returned directly to the client, bypassing further processing.
-
Cache Miss: If no valid cached response exists:
- The API endpoint logic is executed.
- Data is fetched from the PostgreSQL database or computed.
- The response is generated.
- The response is stored in Redis with a defined Time-To-Live (TTL).
- The response is sent to the client.
This document provides a high-level overview of key data flows. More detailed sequence diagrams or flowcharts can be added to Component Interaction Diagrams or this page to illustrate specific complex scenarios.