Architecture - GT-Analytics/fuam-basic GitHub Wiki

General

The architecture of FUAM is built on Fabric items like Pipelines, Notebooks, Lakehouses, Semantic models and Power BI reports. We have built the component in a modular structure, which helps you to extend FUAM with your own modules. This architecture design helps to maintain the solution also with ease.

The data ingestion logic is orchastrated and parametizable, which allows to use the main orchestration pipeline for initial and incremental data loads. FUAM Lakehouse is one of the core component in the architecture. All the data is transformed and persisted in a way, which open amazing capabilities analyzing the collected data in a semantic model with DirectLake mode. image

Lakehouses

FUAM is designed with a 'minimalistic' but modular approach. FUAM aims to have a medallion architecture (Bronze, Silver, Gold lakehouses), however we would like to store centrally the target tables (gold layer) and the raw files (bronze layer).

Lakehouse Description
FUAM_Lakehouse Main data storage. Stores all FUAM data into Delta parquet tables
FUAM_Staging_Lakehouse Intermediate storage for processing. No long-term storage of data.
FUAM_Config_Lakehouse Used for deployment of FUAM.

Modules (Pipelines)

FUAM is built with a modular approach. Each module contains the end-to-end data ingestion logic (from source to lakehouse table). Every module is orchestrated in a main orchestration pipeline (see more later), which makes scheduling much easier.

Module Description Item name Populated tables in FUAM_Lakehouse
Capacities Collects capacities and its properties. Load_Capacities_E2E capacities, capacity_users
Workspaces Collects existing workspaces. Personal workspaces are not in-scope. Load_Workspaces_E2E workspaces
Capacity Refreshables Collects scheduled semantic model and its telemetry from historical refreshes. Load_Capacity_Refreshables_E2E capacity_refreshables, capacity_refreshable_days, capacity_refreshable_details, capacity_refreshable_summaries, capacity_refreshable_times
Activities Collects activity logs from the tenant. Load_Activities_E2E activities, aggregated_activities_last_30days
Active Items Collects data about active items on the tenant. Load_Active_Items_E2E active_items
Inventory Collects meta data about the tenant via Scanner API. Load_Inventory_E2E dashboards, dataflows, datasource_instances, environments, eventhouses, eventstreams, kql_databases, lakehouses, notebooks, pipelines, reflexes, reports, semantic_models, warehouses, workspaces_scanned_users
Tenant Settings Takes snapshots of current tenant settings Load_Tenant_Settings_E2E tenant_settings, tenant_settings_enabled_security_groups
Delegated Tenant Setting Overrides Takes snapshots of current delegated capacity tenant setting overrides. Load_Delegated_Tenant_Settings_Overrides_E2E delegated_tenant_settings_overrides
Git Connections Collects current configured git connections to workspaces. Load_Git_Connections_E2E git_connections
Calendar The calendar generates rows (one row = one day) in the delta table. This is required to run this pipeline every day, since the table contains time intelligence helper columns like 'IsInLast14days', which are used later in the semantic model Generate_Calendar_Table (Notebook) calendar

Units (Notebooks)

FUAM uses Spark Notebooks to load, transform, write, merge data. Each module contains its own Notebook, which are typically using inbound parameters.

Important to know: Notebooks will be executed with the Notebook owner identity in FUAM. In Pipeline the user, who deployed the solution should be the same with the user, who is scheduling the pipeline

Semantic Model

General

The main semantic model of FUAM Basic is the FUAM_Basic_PBI_Overview_SM. This contains all the business logic on top of the gold layer (FUAM_Lakehouse delta tables).

Data model

The main point-of-view of the data model is time-based. Whenever possible, a table is connected via relationship to the calendar table. This structure covers lot of different analytical scenarios.

Reconstruction of the data lineage between items or chained-data structures are not in the scope of this semantic model. How to can extend FUAM is described later.

Connectivity mode

In default the FUAM_Basic_PBI_Overview_SM semantic model is connected via DirectLake only mode to the Lakehouse.

Info: There is an other point-of-view of the data model which is coming on the next releases.

Measure structure

The home table of every measure is the "Metrics" placeholder table. The measure groups can almost 1-to-1 mapped to the FUAM modules, which are described above.

On top of the time based tables like 'activities' or 'tenant_settings', there are two kind of measures:

  • Basic measures (like sum, avg, median)
  • Time intelligence measures

Most of the time intelligence measures utilizing the advantages of the pre-calculated time intelligence columns within the 'calender' Lakehouse table

Reports

FUAM Basic provides one central Power BI report FUAM_Basic_Overview_Report.

Filters

By following the best practices the report pages tries to avoid slicers on the report pages. Each report page has its own filter pane column definition to help users focus on the most important information.

Extensibility

Since the FUAM deployment notebook overwrites the items by the next run or in case an update, we recommend to create your own custom workspace to build on top of FUAM additional modules, semantic models, reports etc.

Important: We can't guarantee that items or data structure will change in the future.

In case of implementing your custom requirements, we recommend to follow this steps:

  1. Create a new workspace
  2. Create a new Lakehouse
  3. Shortcut the tables, files from FUAM_Lakehouse to your own lakehouse
  4. Build your own items and logic

Cloning semantic model from FUAM:

  1. Use semantic-link-lab to clone and rebind the semantic model to your custom lakehouse

Cloning reports from FUAM:

  1. Use semantic-link-lab to clone and rebind the report to your custom semantic model