How to build and deploy data applications - stonezhong/DataManager GitHub Wiki

Step 1: deploy data-apps to target using mordor
Step 2: checkout DataManager from github

Step 1: deploy data-apps to target using mordor

Here is my mordor application config:

        "dmapps_beta": {
            "stage"       : "beta",
            "name"        : "dmapps",
            "home_dir"    : "/home/stonezhong/DATA_DISK/projects/DataManager/data-apps",
            "deploy_to"   : [ "dmhost" ],
            "use_python3" : true,
            "config"      : {
                "config.json": "copy"
            }
        },

Here is how the config.json looks like:

{
    "deployer": {
        "class": "spark_etl.deployers.HDFSDeployer",
            "args": [
                {
                    "bridge": "spnode1",
                    "stage_dir": "/root/.stage"
                }
            ]
        },
    "job_submitter": {
        "class": "spark_etl.job_submitters.livy_job_submitter.LivyJobSubmitter",
        "args": [
            {
                "service_url": "http://10.0.0.18:60008/",
                "username": "root",
                "password": "changeme",
                "bridge": "spnode1",
                "stage_dir": "/root/.stage",
                "run_dir": "hdfs:///beta/etl/runs"
            }
        ]
    },
    "job_run_options": {
        "conf": {
            "spark.yarn.appMasterEnv.PYSPARK_PYTHON": "python3",
            "spark.executorEnv.PYSPARK_PYTHON": "python3"
        }
    },
    "deploy_base": "hdfs:///beta/etl/apps"
}

Once you have mordor setup, you can run command below to deploy it

mordor -a stage -p dmapps --stage beta --update-venv T

Step 2: Build and deploy

ssh dmhost
eae dmapps
./build.sh generate_trading_samples
./build.sh execute_sql
# or you can run below to build all apps
./build_all.sh

After you finish the deployment, ssh to bridge to verify it:

[root@spnode1 ~]# hdfs dfs -ls /etl-prod/apps
Found 4 items
drwxr-xr-x   - root supergroup          0 2020-12-18 07:45 /etl-prod/apps/dummy
drwxr-xr-x   - root supergroup          0 2020-12-18 07:45 /etl-prod/apps/execute_sql
drwxr-xr-x   - root supergroup          0 2020-12-18 07:46 /etl-prod/apps/generate_trading_samples
drwxr-xr-x   - root supergroup          0 2020-12-18 07:46 /etl-prod/apps/get_schema

Run an application

# run the cli application
./etl.py -a run -p spark_cli --version=1.0.0.0 --cli-mode

# run a non-cli application remove the --cli-mode option, you can optionally add --input foo.json to feed the application with foo.json as input