| Change Log | Release 2025.1.1 - GT-Analytics/fuam-basic GitHub Wiki

The following changes have been done in version 2025.1.1:

Service Principal Usage for Inventory data

Since the extraction of the inventory data through the Scanner API is done in a notebook in order to enable a parallelisation of the extraction, in the last version the users identity was used in the Sempy Libary to extract the data. This lead to errors for users, who don't have Fabric admin permission. In order to enable an extraction via Service Principal, the possibility has been added to use a key vault to store the service principal authorization credentials. For this the user needs to create a key vault with the following secrets:

fuam-sp-tenant
fuam-sp-client
fuam-sp-secret

The name of the key vault can be configured in the main pipeline. The executing user (the user who owns the pipeline) must have access to the secrets in the key vault. If no key vault is configured or one of the secrets is not found, still the executing users identity will be used for extraction.

Active Items Throtteling

There has been an error in the active items pipeline for bigger tenants. This was caused by a high frequency of requests caused by the pagination. In order to resolve this, the Request intervall (ms) has been increased to 10.000

Handling of extraction of more than 50.000 workspaces

Because the scanner API has a limit of 500 requests a 100 workspaces an hour, in case of the extraction of more than 50.000 workspaces the API could run into error 429. This error is handled now, so the script autmatically waits till the next request can be executed. The mechanism has been adapted from a script by klinejordan (Thanks to Frank Preusker for hinting at this solution)

Capacity Refreshables Error Handling

The capacity refreshables API returns some duplicates in certain cases, which needs to be dropped. Additionaly end dates with value of 0001-01-01 need to be filtered out, because Delta Parquet is not able to store it. This value can be returned from the API in case a refresh has not been ran, yet.

Historisation

In order to keep the historic data, all raw files extracted from the different source APIs in FUAM will be stored in the Files section of FUAM_Lakehouse. This gives us the possibility to enhance future solutions with data we might not have extracted into the delta tables right now. Before the current version this was not harmonized. All files are put into the following structure: Files

History
- Topic <- This can have multiple levels (e.g. Gateway Logs)
  - Year
    - Month
      - Day

Here is an example of the folder structure for Active Items:

For specific cases like the extraction of activities, there is a daily folder under the day folder. In this case the Year/Month/Day folder structure shows the day of requesting and the daily folder is for each historic day that has been extracted. In case you still have historic files in the old folder structure, you can manually move them into the right folders, if you want to keep them.

Move of Silver Tables

The silver tables have been moved from FUAM_Lakehouse to a FUAM_Staging_Lakehouse in order to have a better visibility of the productive tables. In case you have existing silver tables (identified by suffix _silver) in FUAM_Lakehouse, you can delete them, since they don't hold any historic data

FUAM Gateway Monitoring

See: FUAM-Gateway-Monitoring Wiki Page

FUAM SQL Endpoint Monitoring

See: FUAM-SQL-Endpoint-Monitoring Wiki Page

TMMDA for FUAM

See: TMMDA for FUAM Wiki Page

Upper Case IDs in Activities

To be able to join the activities with the other tables, specific IDs have been put to Upper Case. For existing data there will be a migration notebook, which can be executed once and does this transformation

Updated Merge Logic on Tables Active_Items, Capacities & Workspaces

The merge logic towards the gold layer did not handle the deletion of objects correctly. In order to also have deleted items still available a field "fuam_deleted" has been added to indicate, that an item has been deleted

Added capacity_regions to fix capacity location issue on map

Run the following notebook once after deployment: 'Generate_Static_Tables'

Report content

Refactored and enhanced the Power BI report based on your feedbacks

Report colors

Enhanced Power BI report color palette for better readability

Persisting 'Capacity Refreshables' data

Fixed an issue in the '01_Transfer_Capacity_Refreshables_Unit' notebook. Now, the data will be populated over time

Added new pipele to get 'Git connections' data

The main orchestration pipeline contains now the 'Load_Git_Connections_E2E' step

Table creation during Deployment

To make sure all tables are existent for reporting even if some items are not available on the tenant, non existant tables will be created during deployment

Error handling in case of empty data

In specific notebooks a mechanism was added to ensure the notebook does not fail in case of empty data from the API. This is especially important for item types which are not always available on tenants