Data Flows - bcgov/lcfs GitHub Wiki

Data Flows

This document describes key data flows within the LCFS system, covering user interactions, API communication, asynchronous processing, and ETL processes. This addresses requirements for ticket #2409.

1. User Authentication and Authorization Flow (Keycloak)

  1. User Accesses Frontend: User navigates to the LCFS React frontend application.
  2. Redirection to Keycloak: The frontend (using @react-keycloak/web) detects the user is not authenticated and redirects the browser to the Keycloak login page.
  3. User Authenticates: User enters credentials (e.g., IDIR, BCEID) on the Keycloak page.
  4. Token Issuance: Keycloak authenticates the user and issues OIDC tokens (ID Token, Access Token, Refresh Token) back to the frontend via browser redirects.
  5. Token Storage: The frontend stores these tokens securely (e.g., in memory, keycloak-js handles this).
  6. API Requests with Access Token: For subsequent requests to the LCFS Backend API, the frontend includes the Access Token (JWT) in the Authorization: Bearer <token> header.
  7. Backend Token Validation: The LCFS Backend (FastAPI) receives the request, extracts the JWT, and validates it (signature, expiry, issuer, audience) against Keycloak's public keys.
  8. Authorization Check: Based on valid token claims (e.g., roles, user ID), the backend performs authorization checks for the requested resource/action.
  9. Response: Backend processes the request and sends a response to the frontend.

2. Typical User Interaction Data Flow (e.g., Submitting a Report)

  1. User Fills Form: User interacts with a form in the React frontend (e.g., creating a compliance report).
  2. Client-Side Validation: Frontend performs initial validation using libraries like React Hook Form and Yup.
  3. API Request: On submission, the frontend (using axios and possibly managed by React Query useMutation) sends a POST or PUT request to the appropriate LCFS Backend API endpoint (e.g., /api/reports). The request payload contains the form data, and the Authorization header includes the JWT.
  4. Backend Processing:
    • Receives the request.
    • Validates the JWT (as above).
    • Performs authorization.
    • Validates the incoming data using Pydantic models.
    • Executes business logic (e.g., calculating values, checking rules).
    • Interacts with the PostgreSQL database (SQLAlchemy) to create or update records.
    • May interact with MinIO for file storage (if attachments are involved).
    • May publish a message to RabbitMQ if an asynchronous task needs to be triggered (e.g., generating a PDF version of the report).
  5. API Response: Backend sends a response (e.g., success message with the new report ID, or error details).
  6. Frontend Update: Frontend updates the UI based on the API response (e.g., navigates to a success page, displays an error message, React Query updates cache).

3. Asynchronous Task Processing (RabbitMQ)

  1. Task Trigger: An LCFS Backend API endpoint, upon a certain action (e.g., report submission, scheduled job), publishes a message to a specific RabbitMQ queue. The message contains necessary data for the task.
  2. Message Queued: RabbitMQ receives and queues the message.
  3. Worker Consumption: A dedicated worker process (part of the backend or a separate worker service, TBD by inspecting backend code) consumes messages from the queue.
  4. Task Execution: The worker performs the long-running or deferrable task (e.g., generating a complex report, sending an email notification, processing a large dataset).
    • This worker may interact with the database, MinIO, or other services.
  5. Status Update (Optional): The worker might update the status of the task in the database or send a notification (e.g., via WebSocket, email, or another RabbitMQ message) upon completion or failure.

4. ETL Data Flow (TFRS to LCFS via Apache NiFi)

Refer to Data Migration (TFRS to LCFS) and Subsystems and Responsibilities for a detailed overview.

  1. NiFi Flow Trigger: The migration process is typically initiated by starting a NiFi data flow (manually via UI or programmatically, potentially using etl/data-migration.sh).
  2. Data Extraction: NiFi processors connect to the TFRS PostgreSQL database (source) via JDBC.
  3. Data Transformation: Data passes through a series of NiFi processors that perform transformations:
    • Data type mapping.
    • Schema alignment.
    • Value lookups or conversions.
    • Data cleansing.
    • (Specific transformations are defined in the NiFi flow templates in etl/templates/.)
  4. Data Loading: Transformed data is loaded into the LCFS PostgreSQL database (target) by NiFi processors using JDBC.
  5. Error Handling: Records that fail during transformation or loading are routed to an error flow within NiFi and typically logged to files in etl/nifi_output/.
  6. Monitoring: NiFi UI provides monitoring of data flow progress, queues, and processor status.

5. Caching Data Flow

Refer to Caching Strategy.

  1. API Request: Frontend or another service requests data from a cacheable LCFS Backend API endpoint.
  2. Cache Check: The fastapi-cache2 integration in the backend checks Redis for an existing cached response for this request (based on URL and parameters).
  3. Cache Hit: If a valid (non-expired) cached response exists, it is returned directly to the client, bypassing further processing.
  4. Cache Miss: If no valid cached response exists:
    • The API endpoint logic is executed.
    • Data is fetched from the PostgreSQL database or computed.
    • The response is generated.
    • The response is stored in Redis with a defined Time-To-Live (TTL).
    • The response is sent to the client.

This document provides a high-level overview of key data flows. More detailed sequence diagrams or flowcharts can be added to Component Interaction Diagrams or this page to illustrate specific complex scenarios.

⚠️ **GitHub.com Fallback** ⚠️