Airflow on EC2 - qyjohn/Hands-on-Linux GitHub Wiki
This is a quick start type tutorial on how to set up Airflow in standalone mode.
Launch an EC2 instance with Ubuntu 20.04. SSH into the EC2 instance and install Airflow.
sudo apt update
sudo apt install python3-pip
pip3 install 'apache-airflow==2.2.2' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt"
airflow standalone
The default username is admin and the automatically created password is shown on screen. By default Airflow listens on port 8080. We will use Apache2 to forward traffic to port 80 to port 8080. This allows you to access the web UI easily.
sudo apt install apache2
Overwrite /etc/apache2/sites-available/000-default.conf with the following content:
<VirtualHost *:80>
ProxyPreserveHost On
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
Add the necessary Apache module and restart Apache:
sudo a2enmod proxy && sudo a2enmod proxy_http
sudo service apache2 restart
At this point you should be able to access the Airflow web UI by http://ip-address-of-ec2-instance/.
Launch an EC2 instance with Amazon Linux 2 AMI (HVM) - Kernel 5.10. SSH into the EC2 instance and install Airflow.
pip3 install 'apache-airflow==2.2.2' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt"
If you use airflow standalone to start Airflow, you will see the following error message:
Traceback (most recent call last):
File "/home/ec2-user/.local/bin/airflow", line 5, in <module>
from airflow.__main__ import main
File "/home/ec2-user/.local/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>
from airflow import settings
File "/home/ec2-user/.local/lib/python3.7/site-packages/airflow/settings.py", line 35, in <module>
from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf # NOQA F401
File "/home/ec2-user/.local/lib/python3.7/site-packages/airflow/configuration.py", line 1129, in <module>
conf.validate()
File "/home/ec2-user/.local/lib/python3.7/site-packages/airflow/configuration.py", line 224, in validate
self._validate_config_dependencies()
File "/home/ec2-user/.local/lib/python3.7/site-packages/airflow/configuration.py", line 278, in _validate_config_dependencies
f"error: sqlite C library version too old (< {min_sqlite_version}). "
airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.2.2/howto/set-up-database.html#setting-up-a-sqlite-database
With reference to Airflow documentation Setting up a Sqlite Database, we do the following:
sudo yum install gcc tcl
wget https://www.sqlite.org/src/tarball/sqlite.tar.gz
tar xzf sqlite.tar.gz
cd sqlite/
./configure
make
sudo make install
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
Now you should be able to start Airflow in standalone mode:
airflow standalone
The default username is admin and the automatically created password is shown on screen. By default Airflow listens on port 8080. We will use Apache2 to forward traffic to port 80 to port 8080. This allows you to access the web UI easily.
sudo yum install httpd
Edit /etc/httpd/conf.d/default-site.conf, with the following content:
<VirtualHost *:80>
ProxyPreserveHost On
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
Start Apache2 to redirect traffic:
sudo systemctl start httpd
At this point you should be able to access the Airflow web UI by http://ip-address-of-ec2-instance/.