Install Apache Airflow on AWS - N-CP/AIRFLOW GitHub Wiki
This document describes the process that is to be followed for the Apache Airflow. Airflow is a platform to programmatically author, schedule and monitor workflows through dag. It shifts with the DAG scheduler, Web application (UI), Powerful CLI. Use Airflow to author as a directed acyclic graph (DAG) of task. The Airflow scheduler executes your task on an array of workers while following the specified dependency.
The Purpose of this document for Apache Airflow is to describe the various phases, activities performed by the Airflow. It is used to schedule and monitor the workflows (programmatically).
In Scope
• Apache Airflow
• Dags
• Rabbitmq(Message Broker)
• Mysql
• Linux
-
Linux Knowledge
-
Linux OS/Mac OS/VMware Workstation/AWS
3.1 Login to root by using following command
$ sudo su
3.2 Install required libraries by running following commands
$ yum groupinstall “Development tools”
$ yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python-devel wget cyrus-sasl-
devel.x86_6
3.3 Install python 2.7.6 by following commands
$ cd /opt
$ sudo wget --no-check-certificate https://www.python.org/ftp/python/2.7.6/Python-2.7.6.tar.xz
$ tar xf Python-2.7.6.tar.xz
$ cd Python-2.7.6
$ ./configure --prefix=/usr/local
$ make && make altinstall
$ ls -ltr /usr/local/bin/python*
$ vi ~/.bashrc
$ alias python='/usr/local/bin/python2.7' #add this line in /.bashrc.
$ source ~/.bashrc (using this command run ~/.bashrc)
3.4 Install PIP
-
Run Install
$ cd /tmp/
$ wget https://bootstrap.pypa.io/ez_setup.py
$ python ez_setup.py
$ unzip setuotools-33.1.1.zip
$ cd setuptools-33.1.1
$ easy_install pip
-
Verify Installation
$ which pip
#Should print out “/bin/pip
3.5 Install Airflow
Login as a root and run following commands
$ pip install airflow==1.8.0
If it is giving following error
Then run following command to install python packages
$ yum install python-devel
It will give following result
Type “y”
It will download required packages for airflow and again run the following command
$ pip install airflow ==1.8.0
$ pip install airflow[celery]==1.8.0
3.6 Install Rabbitmq (log in as root)
$ yum -y update
$ yum install epel-release
$ yum install rabbitmq-server
Start Rabbitmq-Server by using following command
$rabbitmq-server start
Verify the rabbitmq status by following command
$ rabbitmqctl status
Install Rabbitmq WebInterface by using command
$ rabbitmq-plugins enable rabbitmq_management
Navigate rabbitmq
http://{Host}:15672
Following is the web UI of Rabbitmq
After login you can see the following screen
For login into rabbitmq set the user and password into airflow.cfg as a broker url.
3.7 Install MySql dependancies.
$ yum install -y mysql-devel python-devel python-setuptools
$ pip install MySQL-python
Install mysql-sever using following command
$yum install mysql-server
Start mysql server using following command
$ systemctl start mysqld
3.8 Configuring airflow
$ export AIRFLOW_HOME=~/airflow
$ airflow initdb
Make the following changes in airflow configuration file (airflow/airflow.cfg)
• Change the Executor to LocalExecutor (Recommended for production)
>>executor = LocalExecutor
• Set password authentication to “True”. Add following changes for Webserver
>> authenticate = True
>> auth_bakend = airflow.contrib.auth.backends.password_auth
• Point SQL Alchemy to MySQL (if using MySQL)
>>sql_alchemy_conn = mysql://{USERNAME}:{PASSWORD}@{MYSQL_HOST}:3306/airflow
• Set dags are paused on startup. This is a good idea to avoid unwanted runs of the workflow. (Recommended)
>> dags_are_paused_at_creation = True
• Don’t load examples
>> load_examples = False
• Point Celery to MySQL (if using MySQL)
>>celery_result_backend = db+mysql://{USERNAME}:{PASSWORD}@localhost:3306/airflow
• Set the default_queue name used by CeleryExecutors (Optional: Primarily for if you have a preference of the default queue name or plan on using the same broker for multiple airflow instances)
>>default_queue = {YOUR_QUEUE_NAME_HERE}(any)
• Setup MySQL (if using MySQL)
Login to the mysql machine
$mysql –u root –p (press enter and enter the password set in airflow.cfg file)
• Create the airflow database if it doesn’t exist
>> CREATE DATABASE airflow;
• Grant access
>> grant all on airflow.* TO ‘USERNAME'@'%' IDENTIFIED BY ‘{password}';
• Select airflow as a database
• Run initdb to setup the database tables
$ airflow initdb
• Show tables by using following command(login to mysql)
Mysql> show tables;
• Create needed directories
$mkdir dags
$mkdir logs
3.9 Controlling airflow services
Starting Services
-
Start Web Server
nohup airflow webserver $* >> /logs/airflow/webserver.logs &
-
Start Celery Workers
nohup airflow worker $* >> /logs/airflow/worker.logs &
-
Star Scheduler
nohup airflow scheduler >> /logs/airflow/scheduler.logs &
• Navigate to the Airflow UI
>>http://{HOSTNAME}:8080/admin/
Login to mysql and fire following query under “users” table of “airflow” database For adding user and password for webUI
>Insert into users values (1,'admin','[email protected]','password')
(For login details Set Login ID and Password in mysql “user” table)
• After logged in you can see following screen where you can see your dags and you schedule those dags and perform operations.
DAG- In airflow, a DAG is collection of all the task you want to run, organized in a way that reflects their relationship and dependencies.
(For any query of installation on centos and ubuntu follow the below link)
http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/