Brief
The sample Data Application
- Build and deploy
- Register with Data Manager
Create a Data Repository
Create a pipeline
Create a scheduler
Unpause the pipeline
Create a SQL pipeline

Brief

This article discuss how to use Data Manager UI after you have installed it.

The sample Data Application

Build and deploy

You need to ssh to the dm server, to build the application, you need to do

ssh <your-host>
eae dmapps
./build.sh generate_trading_samples

Register with Data Manager

The purpose of this step is to let Data Manager know you have this data application. To do this, you need to login to Data Manager, then

Click menu "Application", then click button "Create"
set name to Import Trading Data
set team to trading
set description to Application to generate Trading Samples
set Location to hdfs:///beta/etl/apps/generate_trading_samples/1.0.0.0

Create a data repo

Click the menu "Data Repositories"
set name to main
set type to Hadoop File System

set details to

{
  "base_url": "hdfs:///beta/data"
}

Create a pipeline

Now, let's create a pipeline to ingest the trading sample data.

Click the menu "Pipelines", then click button "Create"
In "Basic Info" tab
- set name to import-trading-data
- set team to trading
- set category to daily-trading
- set type to simple-flow

In "Tasks" tab:

Create task begin -- click the button "Add Task"
- name is begin
- type is Dummy
create task import-trading-data-nasdaq
- name is import-trading-data-nasdaq
- type is Application
- Select "Import Trading Data" application
- set arguments to
```
{
    "action": "import-data", 
    "market": "NASDAQ", 
    "base_location":"/", 
    "repo":"main",
    "dt": "{{dt}}"
}
```
create task import-trading-data-nyse
- name is import-trading-data-nyse
- type is Application
- Select "Import Trading Data" application
- set arguments to
```
{
    "action": "import-data", 
    "market": "NYSE", 
    "base_location":"/", 
    "repo":"main",
    "dt": "{{dt}}"
}
```

create task create-view

name is create-view
type is Application
Select "Import Trading Data" application

set arguments to below

{
    "action": "create-view", 
    "dt": "{{dt}}",
    "repo": "main",
    "loader": {
        "name": "union",
        "args": {
            "dsi_paths": [
                "{{xcom['import-trading-data-nasdaq'].dsi_path}}",
                "{{xcom['import-trading-data-nyse'].dsi_path}}"
            ]
        }
    }
}

Create task end
- name is end
- type is Dummy

Set dependency, so we have

begin --> import-trading-data-nasdaq
begin --> import-trading-data-nyse
import-trading-data-nyse --> create-view
import-trading-data-nyse --> create-view
create-view --> end

After pipeline is created, you can refresh the page to see the airflow DAG link.

Create a scheduler

Now, we need to create a scheduler so we can run this pipeline regularly. Click the button "Schedulers", then click the button "Create"

set name to daily-trading
set description to daily trading scheduler
set category to daily-trading
set context to {"dt": "{{due.strftime(‘%Y-%m-%d’)}}"}
set team to trading
set Interval to 1 DAY
set Start to 2021-03-08 00:00:00 for example then click button "Save changes"

Unpause the pipeline

Since the pipeline is in "paused" status after it has been created. Let's unpause it.

Click the menu "Pipelines", then click "import-trading-data" and click the button "unpause"

Create a SQL pipeline

Now, let's create another pipeline to do some data transformation using SQL statement.

First, click the menu "Pipelines", then click button "create"
In "Basic Info" tab
- set name to get-top-picks
- set team to trading
- set category to daily-trading
- set type to simple-flow
- set "Required assets" to tradings:1.0:1:/{{dt}}
In "Tasks" tab:
- Create task get-top-pick -- click the button "Add Task"
  - type is Spark-SQL
  - Click button "Add Step" in tab Spark-SQL
    - set name to get-top-picks
    - import asset tradings:1.0:1:/{{dt}} as tradings
    - set SQL statement to below
```
SELECT 
    symbol, sum(amount) as volume
FROM tradings
GROUP BY symbol
ORDER BY sum(amount)
LIMIT 3
```
    - check "Write Output"
    - set Location to hdfs:///beta/data/top_picks/{{dt}}.parquet
    - set Asset Path to top_picks:1.0:1:/{{dt}}
    - set datatime to {{dt}} 00:00:00
    - set type to parquet
    - set Write Mode to overwrite

Quick Start - stonezhong/DataManager GitHub Wiki

Brief

The sample Data Application

Build and deploy

Register with Data Manager

Create a data repo

Create a pipeline

Create a scheduler

Unpause the pipeline

Create a SQL pipeline

⚠️ GitHub.com Fallback ⚠️

Quick Start - stonezhong/DataManager GitHub Wiki

Brief

The sample Data Application

Build and deploy

Register with Data Manager

Create a data repo

Create a pipeline

Create a scheduler

Unpause the pipeline

Create a SQL pipeline

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️