Example Illustration - mageshwaranr/dmc GitHub Wiki
Example Scenario
A reporting system (data warehouse) receives following dataset from external sources / OLTP systems.
- Accounts data (frequency: multiple times a day, source: external / firmwide system)
- Products data (frequency: daily once, source: external / firmwide system)
- Transactions data ( frequency: once a day per partition, source: OLTP system )
- The data is custom partitioned in several buckets (say P1, P2.. Px) . The partition criteria is hard and consistent meaning a customer will always be in a particular partition.
- Positions data ( frequency: once a day per partition, source: OLTP system )
- The data is custom partitioned in several buckets (say P1, P2.. Px).
Following are the rules to process each of the data set.
- Accounts (dimension) data doesn't have dependencies with any other data set.
- Products (dimension) data doesn't have dependencies with other data set.
- Transaction (fact) data processing has dependency on processed Accounts data (today) & Products data (today) along with incoming data transaction data (today). Following dependency criteria applies
- To process Transaction data of 6-Jun, Account data of 6 Jun and Products data of 6 Jun should have been processed. It's okay even if Accounts data of 7-Jun is processed.
- Positions (fact) data has dependency on processed Accounts (today), Products (today) , Transaction data (today) and Positions data (yesterday). Following dependency criteria applies.
- To process Positions data of
6-Jun
data of a partitionP4
, following conditions should be met.- Account data should have been processed at least up to
6-Jun
- Product data should have been processed at least up to
6-Jun
- Transaction data of partition
P4
should have been processed for6-Jun
- Account data should have been processed at least up to
- To process Positions data of
Data processing system
1. Data processing system publishes an Event whenever a new data set is received
2. Data processing system publishes an event a data set processing is completed.
3. Data processing can listening to a web-hook notifications / Queue to start data processing.
Illustration of Rules Configured
Following snapshot shows the rule configured in DMC via the REST APIs.
Illustration of Events Flow
DMC Rules
The rules will look like below. (Refresh the page to view flow/animation)
The Processor would have set following dependency for each of the data set.
- Accounts data processor & Products data processor are simple with only in-order processing dependency.
-
event(RAW_ACCOUNTS) .set("currentDay").fromHeader("CURRENT_BUSINESS_DATE") // create an alias .notify("webhook-URL")```
-
- Transactions data processor dependency can be expressed as following
-
event(ACCOUNTS) .and() .event(PRODUCTS) .and() .event(RAW_TRANSACTIONS) .and() .is(ACCOUNTS, currentDay).gte(RAW_TRANSACTIONS, currentDay) .and() .is(PRODUCTS, currentDay).gte(RAW_TRANSACTIONS, currentDay) .notify("webhook-URL-transaction-processor")```
-
- Positions data processor dependency can be expressed as following
-
event(ACCOUNTS) .and() .event(PRODUCTS) .and() .event(TRANSACTIONS) .and() .event(RAW_POSITIONS) .and() .is(ACCOUNTS, currentDay).gte(RAW_POSITIONS, currentDay) .and() .is(PRODUCTS, currentDay).gte(RAW_POSITIONS, currentDay) .and() .is(TRANSACTIONS, currentDay).eq(RAW_POSITIONS, currentDay) .and() .groupBy(partition, snaphotVersion) .notify("webhook-URL-positions-processor")```
-