Shared Trails - eclipse-efbt/efbt GitHub Wiki

Shared Trails Approach

Introduction

Eclipse Free BIRD Tools follows an approach that we call the ‘Shared Trails’ approach, which provides exceptional lineage.

The Shared Trails approach offers a clear, interactive way to trace how data and metadata flow through calculations—from input tables all the way to the final report. By treating each calculation path as a “trail” users can zoom in on individual steps for debugging, validation, or compliance checks. To enforce the shared trail approach we use a methodology we call CaCaLiMo (Collaborative Calculation Lineage Model).

The appraoch is designed to meet the lineage demands of the most complex regulations such as COREP Market Risk and Credit Risk RWA Calcualtions.

What Is a Trail?

A trail represents the lineage of a single calculation or data point. It can take two main forms:

Metadata trails

Metadata trails detail the dependencies among functions and their inputs used to compute a report cell, much like excel formulas specify their inputs. We show an example here of a made-up process, each box at the bottom represents a regulatory report cell linked to input data at the top through metadata trails.

a_meta_data_trail

Below we show a focused view of one trail showing step-by-step dependencies for a single data point, it is found by just following all the lines from the datapoint.

a_meta_data_trail_greyed

Because Shared Trails treats transformations as pure functions under CoCaLiMo, you can extract this trail as executable code—ideal for targeted debugging and validation.

We still show all the information from other trails here greyed out, highlighting that we can consider the whole system as a superposition of trails, some which share the same elements (they can overlap)

Data trails

Data trails record the actual values produced at each step of a calculation when the system is populated with real data. We show an examples below from COREP for one part of FRTB MArket Risk, and highlight the similarity to Excel processing by showing the excel formula link arrows. These data trails can be output from a working system, or can be provided by business experts as expected results to guide implementations, thay are paricularly useful for multistep processes like those we see in market risk (e.g FRTB) or credit risk processing :

a_frtb_excel

We also note that in a 3D visualisation we can show both meta data trails and data trails at the same time, we show here one for Finrep

finrep_1

with a video link here: https://youtu.be/PeBMP-O_uvc

and we also show one for FRTB , the full video is here https://www.youtube.com/watch?v=ZN3Cvrv6T8U , and below we show the top view (meta data trail)

a_frtb_meta_data_lineage

a full view - data trail (best seen in the video)

a_ffrtb_data_lineage

and also a side view which shows row lineage:

a_frtb_row_lineage.

Key Benefits

  1. Focused Debugging

By isolating one trail, you can test and debug a specific calculation for one cell without wading through the entire reporting processing.

  1. Executable Documentation

When trails are executable code, they serve as live examples, enabling rapid validation of individual steps.

  1. Testable Acceptance Criteria

Analysts can define expected data trails as acceptance tests, ensuring new or revised processes meet precise requirements. This is especially useful for complex regulations like COREP.

Regulatory Reporting as Trail Superposition

Complex reports—such as regulatory filings—are effectively a superposition of many trails. Each report cell is backed by metadata and data trails, and you can select a single cell’s trail as a fully worked example to understand the underlying calculations in detail.

Lineage of Analysis

Shared Trails also links calculation functions to regulatory text or business requirements:

• Function-level mapping ties individual functions to specific regulatory rules.

• Process-level grouping organizes functions into higher-level processes or subprocesses—similar to a business process view—while still preserving the trail details.

This hierarchical approach helps regulatory analysts move smoothly from an overview to granular details.

A high-level process map groups related calculation functions into subprocesses. From here, analysts can drill down into any subprocess to view its metadata trail, sub processes and super processes can be linked linked to regulatory requirements, so we can link to high level part of regulation like 'Risk Class Processing' or low level like 'GIRR delta risk weight'. For example:

a_bpmn_frrtb

Note that we can link here from any box (including the high level processes) to a section of regulation such as the CRR.

Note that this process/subprocess diagram does not drive the calculation flow but rather it gives an overview of the functions grouped into useful groupings to aid understanding and allow users to compare the processes to their understanding of regulation.

Complex regulations like COREP are normally understood and described as a sub process model. For example credit risk is a sub process of Corep, standard approach to credit risk is a subset of credit risk, simplified approach to credit risk is a subset of the standard approach to credit risk, risk weigh substitution process is a subset of credit risk simplified approach etc.

Standards and Models

To manage complexity and maintain consistency, Shared Trails relies on clear computation standards:

CoCaLiMo (Collaborative Calculation Lineage Model)

CoCaLiMo is a functional, data-transformation framework inspired by functional programming (the style underpinning tools like Excel).

CoCaLiMo enforces a standardized trail structure and treats transformations as pure data-set–to–data-set functions (similar to Apache Spark or Python Pandas).

Code that follows this pattern can be run by a simple Orchestrator such as the 100 line Python Orchestrator here at https://github.com/eclipse-efbt/efbt/blob/develop/birds_nest/pybirdai/process_steps/pybird/orchestration.py

RPMN (regulatory Process Modelling Notation)

RPMN is a very lightweight process notation that captures business processes and links them to regulatory requirements. RPMN groups functions into workflows and annotates them with requirement text.

RPMN is a small subset of the BPMN standard (called BPMN Lite) with a small extension for storing regualatory text or requirements. You can see the requirements model here https://github.com/eclipse-efbt/efbt/blob/feature_release_1.3.0/regdna/model/core_model/images/requirements_text%20class%20diagram.png or the bpmnLite model here https://github.com/eclipse-efbt/efbt/blob/feature_1.1.0_release/openregspecs/model/core_model/images/bpmn_lite%20class%20diagram.png

AORTA (An Open Regulatory Testing Architecture)

AORTA is a format for defining tests or worked examples that trace cell-level lineage. Programs following the CoCaLiMo approach can export worked examples in AORTA format. Simlarly we can tranlsate from Excel tests as requirements in a standard format and create AORTA files

A Travel Analogy for Trails

To illustrate how trails work, we consider an example of tests as trails, but the anlogy applies to all uses of trials. imagine a singee trail as mapping a simple walking route through a park, or driving through a city:

1. Define the Path

You start at the main entrance (Point A) and follow a series of signs, paths, and landmarks until you exit at the park gate (Point B).

Each step—turn at the fountain, follow the riverbank, cross the bridge—acts like one calculation/function in a data trail. You could imagine also navigating one simple path from A to B in a city with many many streets like follows:

a_frankfurt_trail

2. Record Each Step

By noting down each instruction in order—"From the entrance, walk 200 m to the fountain; turn right and follow the river for 150 m; cross the bridge and exit at the gate"—you create a test that anyone can follow and verify.

3. Isolate Local Knowledge

The route encapsulates the local details (park layout, landmarks) without requiring you to know the entire city. Similarly, a data trail test captures the specific calculation logic without exposing every formula in the full report.

4. Enable Shared Understanding

Like a detailed map, tests serve as a common reference for analysts, developers, and auditors.

They focus discussions on a single, well-defined scenario—just as you’d discuss one path through the park rather than the whole city’s roads.

5. Combine into a Roadmap

If every expert shared their favorite park route, you’d build a comprehensive network of paths. In Shared Trails, each test contributes to a library of examples (AORTA files) that together form a complete roadmap of calculation scenarios linked to regulatory requirements.

Consider for example that a cyclist made this by turning GPS on during his bike journeys.

a_london_trail

Imagine if every regulatory expert in London mapped their daily commute , or their daily commute through complex regulatory processing, then the result would be a roadmap which contains the full complexity of the roadmap between points of interest (report cells and data) but is still decomposable to simple individual journeys that can be discussed in isolation.

6. Support Collaboration and Automation

Analysts define tests in AORTA format.

Developers implement code to satisfy those tests following the practices of CoCaLiMo,

Auditors and regulators can compare resulting trails to expected ones to tests to confirm compliance.

This travel-comparison analogy shows how Shared Trails turns complex calculations into clear, executable tests, fostering transparency, consistency, and teamwork.

Change Management

When we consider a system as superposition of trails, we can then compare two releases and find that 95% of the trials are the same and then focus on which changed.

Avoiding Entanglement

By isolating calculations into independent trails, with only essential points of overlap , Shared Trails prevents tangled dependencies. This design:

• Enables greater parallel execution, since trails run independently.

• Supports result caching for pure functions—identical inputs always yield identical outputs, improving performance.

Conclusion

By modelling calculations as Shared Trails, organizations gain transparent, testable, and maintainable lineage for both metadata and data which can be broken down into bitesized concise trails.

This approach enhances debugging, compliance, and collaboration across business analysts, developers, testers and regulators.