Quick Start - stonezhong/DataManager GitHub Wiki
- Brief
- The sample Data Application
- Create a Data Repository
- Create a pipeline
- Create a scheduler
- Unpause the pipeline
- Create a SQL pipeline
This article discuss how to use Data Manager UI after you have installed it.
You need to ssh to the dm server, to build the application, you need to do
ssh <your-host>
eae dmapps
./build.sh generate_trading_samples
The purpose of this step is to let Data Manager know you have this data application. To do this, you need to login to Data Manager, then
- Click menu "Application", then click button "Create"
- set name to
Import Trading Data
- set team to
trading
- set description to
Application to generate Trading Samples
- set Location to
hdfs:///beta/etl/apps/generate_trading_samples/1.0.0.0
- Click the menu "Data Repositories"
- set name to
main
- set type to
Hadoop File System
- set details to
{ "base_url": "hdfs:///beta/data" }
Now, let's create a pipeline to ingest the trading sample data.
- Click the menu "Pipelines", then click button "Create"
- In "Basic Info" tab
- set name to
import-trading-data
- set team to
trading
- set category to
daily-trading
- set type to
simple-flow
- set name to
- In "Tasks" tab:
- Create task
begin
-- click the button "Add Task"- name is
begin
- type is
Dummy
- name is
- create task
import-trading-data-nasdaq
- name is
import-trading-data-nasdaq
- type is
Application
- Select "Import Trading Data" application
- set arguments to
{ "action": "import-data", "market": "NASDAQ", "base_location":"/", "repo":"main", "dt": "{{dt}}" }
- name is
- create task
import-trading-data-nyse
- name is
import-trading-data-nyse
- type is
Application
- Select "Import Trading Data" application
- set arguments to
{ "action": "import-data", "market": "NYSE", "base_location":"/", "repo":"main", "dt": "{{dt}}" }
- name is
- create task
create-view
- name is
create-view
- type is
Application
- Select "Import Trading Data" application
- set arguments to below
{ "action": "create-view", "dt": "{{dt}}", "repo": "main", "loader": { "name": "union", "args": { "dsi_paths": [ "{{xcom['import-trading-data-nasdaq'].dsi_path}}", "{{xcom['import-trading-data-nyse'].dsi_path}}" ] } } }
- name is
- Create task
- Create task
end
- name is
end
- type is
Dummy
- name is
- Set dependency, so we have
begin --> import-trading-data-nasdaq begin --> import-trading-data-nyse import-trading-data-nyse --> create-view import-trading-data-nyse --> create-view create-view --> end
After pipeline is created, you can refresh the page to see the airflow DAG link.
Now, we need to create a scheduler so we can run this pipeline regularly. Click the button "Schedulers", then click the button "Create"
- set name to
daily-trading
- set description to
daily trading scheduler
- set category to
daily-trading
- set context to
{"dt": "{{due.strftime(â%Y-%m-%dâ)}}"}
- set team to
trading
- set Interval to
1 DAY
- set Start to
2021-03-08 00:00:00
for example then click button "Save changes"
Since the pipeline is in "paused" status after it has been created. Let's unpause it.
Click the menu "Pipelines", then click "import-trading-data" and click the button "unpause"
Now, let's create another pipeline to do some data transformation using SQL statement.
- First, click the menu "Pipelines", then click button "create"
- In "Basic Info" tab
- set name to
get-top-picks
- set team to
trading
- set category to
daily-trading
- set type to
simple-flow
- set "Required assets" to
tradings:1.0:1:/{{dt}}
- set name to
- In "Tasks" tab:
- Create task
get-top-pick
-- click the button "Add Task"- type is
Spark-SQL
- Click button "Add Step" in tab Spark-SQL
- set name to
get-top-picks
- import asset
tradings:1.0:1:/{{dt}}
astradings
- set SQL statement to below
SELECT symbol, sum(amount) as volume FROM tradings GROUP BY symbol ORDER BY sum(amount) LIMIT 3
- check "Write Output"
- set Location to
hdfs:///beta/data/top_picks/{{dt}}.parquet
- set Asset Path to
top_picks:1.0:1:/{{dt}}
- set datatime to
{{dt}} 00:00:00
- set type to
parquet
- set Write Mode to
overwrite
- set name to
- type is
- Create task