Test - stonezhong/DataManager GitHub Wiki

Indexes

Applications

Execute SQL

Name:        Execute SQL
Team:        admins
Description: System Application for Executing SQL statements
Location:    hdfs:///etl/apps/execute_sql/1.0.0.0

Then replace the sys_app_id to 1 in MySQL client.

Import Trading Data

Name:        Import Trading Data
Description: Import daily trading data.
Team:        trading
Location:    hdfs:////etl/apps/generate_trading_samples/1.0.0.0

Schedulers

daily-trading

Name:     daily-trading
Category: daily-trading
Context:  {"dt": "{{due.strftime('%Y-%m-%d')}}"}
Team:     trading
Interval: 1 DAY
Start:    2020-12-02 00:00:00

Pipelines

import-trading-data

Pipeline settings:

name    : import-trading-data
team    : trading
category: daily-trading
type    : simple-flow

task: import-trading-data-nasdaq

name: import-trading-data-nasdaq
type: other
Application: Import Trading Data
Task Arguments: {"action": "import-data", "market": "NASDAQ", "data_root":"hdfs:///data"}

task: import-trading-data-nyse

name: import-trading-data-nyse
type: other
Application: Import Trading Data
Task Arguments: {"action": "import-data", "market": "NYSE", "data_root":"hdfs:///data"}

task: create-view

name: create-view
type: other
Application: Import Trading Data
Task Arguments: 
{
    "action": "create-view", 
    "loader": {
        "name": "union",
        "args": {
            "dsi_paths": [
                "{{xcom['import-trading-data-nasdaq'].dsi_path}}",
                "{{xcom['import-trading-data-nyse'].dsi_path}}"
            ]
        }
    }
}

get-top-picks

Name:        get-top-picks
Team:        trading
category:    daily-trading
type:        simple-flow
required assets: tradings:1.0:1/{{dt}}

task: get-top-picks

Name:     get-top-picks
Type:     Spark-SQL

step: get-top-picks-step

Name:     get-top-picks-step
Import:   tradings  ==> tradings:1.0:1:/{{dt}}
SQL:
SELECT 
    symbol, sum(amount) as volume
FROM tradings
GROUP BY symbol
ORDER BY sum(amount)
LIMIT 3

Write Output: Yes
Location: hdfs:///data/top_picks/{{dt}}.parquet
Type: parquet
Asset Path: top_picks:1.0:1:/{{dt}}
Data Time: {{dt}} 00:00:00