An open discussion on design: The Journey from BIRD Logical Data Model To regulatory submissions - eclipse-efbt/efbt GitHub Wiki

This document is unders construction, it will be turned into a Github Discussion for easier contribution

Introduction

The BIRD Logical Data Model provides a detailed description of financial data described and discussed by the BIRD Data Modelling Group. Read LDM as a superpower of BIRD.

The end goal of BIRD transformations is to populate reports cell such as those in Finrep and Corep templates and also dataset style regulatory submissions such as IREF.

There are many different options for getting data from get from input structures to submissible report or data sets, we explore many here, break them down into the different patterns they use and offer a means to discuss these openly.

The Eclipse Free BIRD Tools project, governed in a vendor neutral fashion by the Eclipse Foundation, provides the means to make open solutions implementing the best approach under an open source license, which is commercially friendly for any organisation to build upon as further open or commercial extensions and adaptations.

The Starting Point of BIRDs Data (Input Concepts)

We can consider 3 kinds of path and are open to hear others of course.

Path 1:

path 1 takes the BIRD LDM as a detailed description of meta data, and does not imagine that it will contain data…in this path we first create a more implementation style model by forward engineering the model LDM into flatter structure, the current BIRD Input Layer being an example.

In this approach the data (banks data about loans bonds, etc) is input into the flatter structure, and from there it starts its journey through transformations/manipulations into the final submissible format…it possible to have custom Implementations models (by providing a custom forward engineering process, there are some disadvantages of using a custom implementation model that other people don’t use.

Path 2:

considers that we make the LDM structure into an implementation model, and that data is put into the LDM itself and launched on a path towards submittable format. There are some challenges to creating an implementation model form an LDM and its use of inheritance and composite keys. Through experiments in Eclipse free BIRD Tools we have found it is possible, and we describe one approach to dealing with these issues in the wiki articles X and Y.

Path 3:

Enriched data. We can consider and implementation model the contains derived data also, such as BIRDS ELDM or EIL. This is a possible extension to Path 1 or 2 also.

If using this approach, implementation is a lot easier if users are constrained to an ‘All or nothing’ approach. That is to say, when populating the data model, users provide data for all derived fields for all inputs, or they provide none at all. In the case that they provide none at all it is expected that transformation rules populate these derived fields.

An approach where users provide, only some derived fields and not others, or provide these fro some poitions and not others, makes implementation very hard. This is because transformation rules fro derived fields might take other derived fields as input, and so it can be complex to decide when to run a derivation rule or not.

The Language of Data (Dictionaries)

One dictionary

Ttranslating dictionaries, Translating codes.

Transaction/Contract oriented vs position oriented.

Banks datamodel oriented vs regulation oriented.

Challenges with multiple dictionaries.

The destination of Data (Output Concepts)

Data Sets and DataPoints

Datapoint based vs Row column based.

A Valid starting point, data quality.

Notes structural validation/ business validation.

The Path Of data

Intermediate structures

Enriched vs Not Enriched

Inheritence vs no inheritance.

How many layers of intermediate structures

Corep, Finrep

Resting stops, or final destination?

Whats the shape of intermediate structures (normalisation)

How flat is flat?

LDM flat?

EIL Flat?

Record Type Flat?

Familiarity vs ease of use.

Flat Structures

Examples

How wide are flat structures?

Approaches to transforming from one intermediate structure to another

Approaches to flattening data

Joins FE?

Up and along?

Record types and input forms? Relational EIL

Per reg

How to travel from one layer to another

From layer to layer…functional or assignment based.

Example VTL, where one layer is a function of another

Avoiding mutable state.

Filters/Functions

End result flat structures

Large flat structures, small flat structures

Functions

N to 1, NtoN fields

Intermediate structures

Tags

LDM as a guide or as a source of input.

Derivation as functions

The role of filters

Simple enrichments vs complex enrichments.

Flat structures per reg/product/report/report cell.

Existing work (Finrep),

Differences between Corep and Finrep.

Tracing the Path (lineage)

Lineage of Analysis

Meta Data Lineage

Variables only vs location

Data Lineage

Planning the journey (Analysis)

Gap analysis?

Manual

Partially Automated

Walking the Path (implementation)

Should design facilitate implementation?

Should design be far away from implementation?

Template reports vs dataset reports

Application of AI

Application of Automation

Automated creation of flat structures

Automated gap analysis

Use of mature frameworks.

Object Orientation.

The Starting Point of Banks Data (From banks to BIRD/ ETL)

Did data reach the correct destination (Testing)

Reference implementations.

EFBT’s Current process