Install Apache Airflow in a Python virtualenv using reticulate and pip - Davz33/tutorials GitHub Wiki

Premises

Install python3, follow your OS installation instructions.

If you do not have pip installed, run:
python3 get-pip.py. If you already do, make sure it's updated:
python3 -m pip install -upgrade pip.

Assuming you have R installed, install the reticulate package.
install.packages('reticulate').

Setup

You need to know your python3 installation folder first, that's how to do it:

  • on Unix-based systems: which python
  • On any system python3 -c "import sys; print(sys.executable)" The directory you need for next steps is called bin.
    With the former of the commands you might have to add /bin to the end of the output, if not present.
    In the latter case, you might get a sub-directory of /bin, in which case, you have to delete it from the path.

On R console, run
virtualenv_create(envname = ≤yourpythonenvname≥, python = <the python path you previously identified≥.

The official documentation prescribes a set of constrains to follow when installing Airflow. virtualenv_install(packages='apache-airflow[celery]==2.4.0',pip_options = '--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.7.txt"').

By the time you'll read this, you might have to change 2.4.0 to the latest Airflow version, and 3.7 to the last suggested python version for Airflow.

Potential errors

An SSL certificate error might happen when using the constraint file. In that case, you can download the file, rename it, and put it in place of the remote .txt file during the virtualenv_install step.
curl -o constraints_airflow_local.txt https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.10.txt

Set your virtualenv as the active one in reticulate

reticulate::use_python(paste0(reticulate::virtualenv_root(),'/<yourpythonenvname>/bin')).