Test - stonezhong/DataManager GitHub Wiki
Indexes
Applications
Execute SQL
Name: Execute SQL
Team: admins
Description: System Application for Executing SQL statements
Location: hdfs:///etl/apps/execute_sql/1.0.0.0
Then replace the sys_app_id to 1 in MySQL client.
Import Trading Data
Name: Import Trading Data
Description: Import daily trading data.
Team: trading
Location: hdfs:////etl/apps/generate_trading_samples/1.0.0.0
Schedulers
daily-trading
Name: daily-trading
Category: daily-trading
Context: {"dt": "{{due.strftime('%Y-%m-%d')}}"}
Team: trading
Interval: 1 DAY
Start: 2020-12-02 00:00:00
Pipelines
import-trading-data
Pipeline settings:
name : import-trading-data
team : trading
category: daily-trading
type : simple-flow
task: import-trading-data-nasdaq
name: import-trading-data-nasdaq
type: other
Application: Import Trading Data
Task Arguments: {"action": "import-data", "market": "NASDAQ", "data_root":"hdfs:///data"}
task: import-trading-data-nyse
name: import-trading-data-nyse
type: other
Application: Import Trading Data
Task Arguments: {"action": "import-data", "market": "NYSE", "data_root":"hdfs:///data"}
task: create-view
name: create-view
type: other
Application: Import Trading Data
Task Arguments:
{
"action": "create-view",
"loader": {
"name": "union",
"args": {
"dsi_paths": [
"{{xcom['import-trading-data-nasdaq'].dsi_path}}",
"{{xcom['import-trading-data-nyse'].dsi_path}}"
]
}
}
}
get-top-picks
Name: get-top-picks
Team: trading
category: daily-trading
type: simple-flow
required assets: tradings:1.0:1/{{dt}}
task: get-top-picks
Name: get-top-picks
Type: Spark-SQL
step: get-top-picks-step
Name: get-top-picks-step
Import: tradings ==> tradings:1.0:1:/{{dt}}
SQL:
SELECT
symbol, sum(amount) as volume
FROM tradings
GROUP BY symbol
ORDER BY sum(amount)
LIMIT 3
Write Output: Yes
Location: hdfs:///data/top_picks/{{dt}}.parquet
Type: parquet
Asset Path: top_picks:1.0:1:/{{dt}}
Data Time: {{dt}} 00:00:00