How to setup Data Manager - stonezhong/DataManager GitHub Wiki

Indexes

Database

  • For dm web app:
    • You must setup a database server
    • You must create a database user
    • You must create a database
  • For airflow:
    • You must setup a database server
    • You must create a database user
    • You must create a database

Here is an example:

# create database
CREATE SCHEMA `beta-dm` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
CREATE SCHEMA `beta-airflow` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin

# create user and grant user permission
CREATE USER 'airflow'@'localhost' IDENTIFIED BY 'XYZ';
CREATE USER 'airflow'@'%'         IDENTIFIED BY 'XYZ';
FLUSH PRIVILEGES;

GRANT ALL ON `beta-airflow`.* TO 'airflow'@'localhost';
GRANT ALL ON `beta-airflow`.* TO 'airflow'@'%';
FLUSH PRIVILEGES;

CREATE USER 'dm'@'localhost' IDENTIFIED BY 'XYZ';
CREATE USER 'dm'@'%'         IDENTIFIED BY 'XYZ';
FLUSH PRIVILEGES;

GRANT ALL ON `prod_dm`.* TO 'dm'@'localhost';
GRANT ALL ON `prod_dm`.* TO 'dm'@'%';
FLUSH PRIVILEGES;

You need to make sure your mordor config db.json matches your settings

Prepare

Prepare dev machine

I use Ubuntu 18.04 as my dev machine. Make sure you have python 3.6 (or above) and node 10.x installed.

checkout source code

mkdir ~/dm
cd ~/dm
git clone https://github.com/stonezhong/DataManager.git

create a directory ~/dm/.mordor, in order to config mordor, you can reference some examples:

mkdir ~/dm/.mordor

# modify ~/dm/.mordor/config.json, make sure it fits your environment

create a virtual environment

mkdir ~/dm/.venv
python3 -m venv ~/dm/.venv
source ~/dm/.venv/bin/activate
pip install pip setuptools --upgrade
pip install wheel mordor2 libsass

setup environment variable

# You can put it in ~/.bashrc
export MORDOR_CONFIG_DIR=~/dm/.mordor

Prepare target machine

My target machine is CentOS 7. In this example, the target machine is dmdemo2

Install some necessary OS packages

yum install tmux mysql-devel graphviz

Optionally, if you want to use pyspark, you need to install JRE,

yum install java-1.8.0-openjdk

# on Oracle Linux 8, you can do
yum install jdk1.8

Deploy Data Manager and Data Application to target

First, initialize target for mordor

source ~/dm/.venv/bin/activate
mordor -a init-host -o beta

Build and deploy

cd ~/dm/DataManager/server
DM_STAGE=beta ./build.sh
mordor -a stage -p dm     -s beta --update-venv
mordor -a stage -p dmapps -s beta --update-venv

Initialize airflow

# This will install airflow and initialize airflow
mordor -a run -p dm -s beta -cmd="-a setup-airflow"

Initialize dm

# This will initialize dm web app's database
mordor -a run -p dm -s beta -cmd="-a setup-dm"

Start

# login to the target

# To start datamanager web
eae dm
# In production environment, you can do below
python -m pip install gunicorn
gunicorn --workers=10 DataCatalog.wsgi -b 0.0.0.0:8888

# In beta environment, you can do
python manage.py runserver 0.0.0.0:8888

# To start datamanager scheduler
eae dm
./scheduler.py

# start airflow
source ~/airflow/.venv/bin/activate
airflow scheduler -D
airflow webserver -D -p 8080

# If you have firewall, you need to open port
sudo firewall-cmd --zone=public --permanent --add-port=8080/tcp
sudo firewall-cmd --zone=public --permanent --add-port=8888/tcp
sudo firewall-cmd --reload

Build Data Applications

# on target
eae dmapps
./build.sh <app name>

# note, if you are using oci Data Flow, you need to 
1. add oci==2.26.0, oci-core=0.0.6 to data-apps virtual env
2. add oci-core and oci to all Data Application dependencies
3. add oci to airflow environment
⚠️ **GitHub.com Fallback** ⚠️