Install Apache Airflow on AWS - N-CP/AIRFLOW GitHub Wiki

1 Introduction

This document describes the process that is to be followed for the Apache Airflow. Airflow is a platform to programmatically author, schedule and monitor workflows through dag. It shifts with the DAG scheduler, Web application (UI), Powerful CLI. Use Airflow to author as a directed acyclic graph (DAG) of task. The Airflow scheduler executes your task on an array of workers while following the specified dependency.

1.1 Purpose

The Purpose of this document for Apache Airflow is to describe the various phases, activities performed by the Airflow. It is used to schedule and monitor the workflows (programmatically).

1.2 Scope

In Scope

• Apache Airflow

• Dags

• Rabbitmq(Message Broker)

• Mysql

• Linux

2 Prerequisites

  1. Linux Knowledge

  2. Linux OS/Mac OS/VMware Workstation/AWS

3 Installation and Working

3.1 Login to root by using following command

 $ sudo su

airflow1

3.2 Install required libraries by running following commands

 $ yum groupinstall “Development tools” 

 $ yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python-devel wget cyrus-sasl- 
   devel.x86_6

airflow2

3.3 Install python 2.7.6 by following commands

 $ cd /opt

 $ sudo wget --no-check-certificate https://www.python.org/ftp/python/2.7.6/Python-2.7.6.tar.xz

airflow3

 $ tar xf Python-2.7.6.tar.xz

 $ cd Python-2.7.6

 $ ./configure --prefix=/usr/local

airflow4

 $ make && make altinstall

 $ ls -ltr /usr/local/bin/python*

airflow5

 $ vi ~/.bashrc
      
 $ alias python='/usr/local/bin/python2.7'    #add this line in /.bashrc.

airflow6

 $ source ~/.bashrc                (using this command  run  ~/.bashrc)

3.4 Install PIP

  1. Run Install

    $ cd /tmp/

    $ wget https://bootstrap.pypa.io/ez_setup.py

    $ python ez_setup.py

airflow7

 $ unzip setuotools-33.1.1.zip

 $ cd setuptools-33.1.1

 $ easy_install pip

airflow8

  1. Verify Installation

    $ which pip
    

#Should print out “/bin/pip

3.5 Install Airflow

Login as a root and run following commands

 $ pip install airflow==1.8.0

airflow9

If it is giving following error

airflow10

Then run following command to install python packages

$ yum install python-devel

It will give following result

airflow11

Type “y”

airflow12

It will download required packages for airflow and again run the following command

$ pip install airflow ==1.8.0 

$ pip install airflow[celery]==1.8.0

3.6 Install Rabbitmq (log in as root)

$ yum -y update

$ yum install epel-release

airflow13

$ yum install rabbitmq-server

airflow14

Start Rabbitmq-Server by using following command

$rabbitmq-server start

airflow15

Verify the rabbitmq status by following command

$ rabbitmqctl status

airflow16

Install Rabbitmq WebInterface by using command

 $ rabbitmq-plugins enable rabbitmq_management

Navigate rabbitmq

http://{Host}:15672

Following is the web UI of Rabbitmq

airflow17

After login you can see the following screen

airflow18

For login into rabbitmq set the user and password into airflow.cfg as a broker url.

3.7 Install MySql dependancies.

$ yum install -y mysql-devel python-devel python-setuptools

airflow19

$ pip install MySQL-python

airflow20

Install mysql-sever using following command

$yum install mysql-server

airflow21

Start mysql server using following command

$ systemctl start mysqld

3.8 Configuring airflow

$ export AIRFLOW_HOME=~/airflow

$ airflow initdb

Make the following changes in airflow configuration file (airflow/airflow.cfg)

• Change the Executor to LocalExecutor (Recommended for production)

 >>executor = LocalExecutor

• Set password authentication to “True”. Add following changes for Webserver

 >> authenticate = True

 >> auth_bakend = airflow.contrib.auth.backends.password_auth

• Point SQL Alchemy to MySQL (if using MySQL)

>>sql_alchemy_conn = mysql://{USERNAME}:{PASSWORD}@{MYSQL_HOST}:3306/airflow

• Set dags are paused on startup. This is a good idea to avoid unwanted runs of the workflow. (Recommended)

>> dags_are_paused_at_creation = True

• Don’t load examples

>> load_examples = False

• Point Celery to MySQL (if using MySQL)

>>celery_result_backend = db+mysql://{USERNAME}:{PASSWORD}@localhost:3306/airflow

• Set the default_queue name used by CeleryExecutors (Optional: Primarily for if you have a preference of the default queue name or plan on using the same broker for multiple airflow instances)

>>default_queue = {YOUR_QUEUE_NAME_HERE}(any)

• Setup MySQL (if using MySQL)

Login to the mysql machine

$mysql –u root –p (press enter and enter the password set in airflow.cfg file)   

airflow22

• Create the airflow database if it doesn’t exist

>> CREATE DATABASE airflow;

• Grant access

>> grant all on airflow.* TO ‘USERNAME'@'%' IDENTIFIED BY ‘{password}';

• Select airflow as a database

airflow23

• Run initdb to setup the database tables

$ airflow initdb

airflow24

• Show tables by using following command(login to mysql)

Mysql> show tables;

airflow25

• Create needed directories

$mkdir dags

$mkdir logs

3.9 Controlling airflow services

Starting Services

  1. Start Web Server

    nohup airflow webserver $* >> /logs/airflow/webserver.logs &

  2. Start Celery Workers

    nohup airflow worker $* >> /logs/airflow/worker.logs &

  3. Star Scheduler

    nohup airflow scheduler >> /logs/airflow/scheduler.logs &

• Navigate to the Airflow UI

>>http://{HOSTNAME}:8080/admin/

airflow26

Login to mysql and fire following query under “users” table of “airflow” database For adding user and password for webUI

 >Insert into users values (1,'admin','[email protected]','password') 

(For login details Set Login ID and Password in mysql “user” table)

• After logged in you can see following screen where you can see your dags and you schedule those dags and perform operations.

airflow28

DAG- In airflow, a DAG is collection of all the task you want to run, organized in a way that reflects their relationship and dependencies.

(For any query of installation on centos and ubuntu follow the below link)

http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/

⚠️ **GitHub.com Fallback** ⚠️