Apache Airflow Research - mozzihozzi/DevOps GitHub Wiki

Apache Airflow

Apache Airflow ์กฐ์‚ฌ

  1. ์†Œ๊ฐœ
  2. ์„ค์น˜๋ฐฉ๋ฒ•
  3. ์‚ฌ์šฉ๊ฐ€์ด๋“œ
  4. ์ปค๋งจ๋“œ ์ •๋ฆฌ

1. ์†Œ๊ฐœ

1.1. Apache airflow ๋ž€?

enter image description here https://airflow.apache.org/

  • ์˜คํ”ˆ์†Œ์Šค ์›Œํฌํ”Œ๋กœ์šฐ ์Šค์ผ€์ค„๋ง, ๋ชจ๋‹ˆํ„ฐ๋ง ํ”Œ๋žซํผ
  • Airbnb์—์„œ ๊ฐœ๋ฐœ
  • ์ฃผ๋กœ ๋น…๋ฐ์ดํ„ฐ์—์„œ ๋ฐ์ดํ„ฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•จ
  • Python ์–ธ์–ด๋กœ ์‚ฌ์šฉ

1.2. ํŠน์ง•

1. ๋ณต์žกํ•œ ํ”„๋กœ์„ธ์Šค๋ฅผ flow diagram ํ˜•ํƒœ๋กœ ํ™•์ธ ๊ฐ€๋Šฅ
2. Python์œผ๋กœ task๋ฅผ ์ž‘์„ฑ ๋ฐ ๊ด€๋ฆฌ + ์ด์˜๊ณ  ํŽธ๋ฆฌํ•œ UI
3. ๊ฐ task์˜ ์‹คํ–‰ ์‹œ๊ฐ„, ์ด๋ ฅ์„ ์‰ฝ๊ฒŒ ํ™•์ธ ๊ฐ€๋Šฅ
4. ํ•„์š”ํ•œ ๊ฒฝ์šฐ ํŠน์ • task๋งŒ ์‹คํ–‰ ๊ฐ€๋Šฅ
5. ๊ฐ task๋ฅผ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅ
6. ๋…๋ฆฝ๋œ ์ž์ฒด ์Šค์ผ€์ฅด๋Ÿฌ๋กœ ๊ฐ ์‚ฌ์šฉ์ž๊ฐ€ ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ

์ข…ํ•ฉํ•˜๋ฉด, ์‚ฌ์šฉ์ž๊ฐ€ workflow๋ฅผ ์‰ฝ๊ฒŒ ๊ด€๋ฆฌ ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•  ์ˆ˜ ์žˆ์Œ

1.3. dag(Directed Acyclic Graph, ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„)

  • workflow๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” task๋“ค์˜ ์ง‘ํ•ฉ๋“ค์„ dag๋ผ๊ณ  ํ•จ
  • dag๋Š” ํ•œ์ชฝ ๋ฐฉํ–ฅ์œผ๋กœ๋งŒ ์ง„ํ–‰๋˜๊ณ  ์ˆœํ™˜๋˜์ง€ ์•Š๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ์˜๋ฏธํ•จ
  • ์—ฌ๋Ÿฌ task๋“ค์ด ์ˆœ์„œ์™€ ์ข…์†์„ฑ์„ ๊ฐ€์ง
  • dag์•ˆ์—์„œ task๋“ค์€ Operator๋กœ ๋ถ€ํ„ฐ ์ •์˜ํ•˜๊ณ  Operator๋“ค์€ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜๋“ค์ด ์žˆ์Œ

enter image description here

1.4. ๊ธฐ๋ณธ ํด๋” ๊ตฌ์กฐ

์„ค์น˜๋œ airflow ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ airflow.cfg, airflow.db๋ฅผ ํ†ตํ•ด ์„ค์ •์„ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ dag, log๋“ค์ด ์ €์žฅ๋œ๋‹ค.

airflow
โ”œโ”€โ”€ airflow.cfg
โ”œโ”€โ”€ airflow.db
โ”œโ”€โ”€ dags
โ”‚   โ”œโ”€โ”€ dags1.py
โ”‚   โ””โ”€โ”€ dags2.py
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ logs
โ”œโ”€โ”€ ...

2. ์„ค์น˜ ๋ฐฉ๋ฒ•

2.1. Prerequisites

  • python
  • pip
  • venv(optional)

2.2. ์„ค์น˜ ๊ณผ์ •

apache-airflow ์„ค์น˜, default๋กœ ~/airflow ๊ฒฝ๋กœ์— ์„ค์น˜ํ•จ ์„ค์น˜ ๊ฒฝ๋กœ๋ฅผ ๋ฐ”๊พธ๊ณ  ์‹ถ์œผ๋ฉด export AIRFLOW_HOME=~/{path} ์œผ๋กœ ์ˆ˜์ • ํ›„ ์„ค์น˜ gcp(google cloud platform) or postgres์—์„œ ์„ค์น˜ํ•  ๊ฒฝ์šฐ์—๋Š” pip install apache-airflow[postgres, gcp]๋ฅผ ์‚ฌ์šฉ

pip install apache-airflow

conda๋กœ ์„ค์น˜ํ•  ๊ฒฝ์šฐ ์•„๋ž˜์˜ command๋ฅผ ์‚ฌ์šฉ

conda install -c conda-forge airflow

airflow ๋ฒ„์ „ ํ™•์ธ

airflow version

airflow์—์„œ ์‚ฌ์šฉํ•  DB ์ดˆ๊ธฐํ™” airflow.cfg์—์„œ DB์„ค์ •์„ ํ•˜์ง€ ์•Š์œผ๋ฉด default๋กœ SQLite ์‚ฌ์šฉ

airflow initdb

UI๋ฅผ ์‹คํ–‰ํ•  ์›น์„œ๋ฒ„ ์‹คํ–‰, default ํฌํŠธ๋Š” 8080, url์€ http://localhost:{port}

airflow webserver -p {port}

airflow ์Šค์ผ€์ฅด๋Ÿฌ ์‹คํ–‰

airflow scheduler

3. ์‚ฌ์šฉ ๊ฐ€์ด๋“œ

3.1. dag ์ž‘์„ฑ

airflow๋Š” Operator๋ฅผ ํ†ตํ•ด task๋ฅผ ์ •์˜ํ•จ. Operator๋Š” ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Œ.

  • DummyOperator
  • BashOperator
  • PythonOperator
  • Dining Operators
  • Google Cloud Operators
  • Papermill

3.1.1. PythonOperator

PythonOperator parameters

  • task_id : ๊ฐ task๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•œ task id, ์ด๋ฆ„์€ Uniqueํ•ด์•ผ ํ•จ
  • python_calllable :์‹ค์ œ ํ˜ธ์ถœ๋œ python ํ•จ์ˆ˜ ์ด๋ฆ„
  • provide_context : python ํ•จ์ˆ˜ ํ˜ธ์ถœ ์‹œ ํ•ด๋‹น ํ•จ์ˆ˜์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ณธ์ ์ธ argument ๊ฐ’์„ ๋„˜๊ฒจ์ค„ ์ง€ ์—ฌ๋ถ€
  • op_kwargs : ๊ธฐ๋ณธ argument ์™ธ์— ์ถ”๊ฐ€๋กœ ๋„˜๊ฒจ์ค„ parameter ์ •์˜
  • dag : default dag ์ด๋ฆ„, ๋ณดํ†ต dag๋ฅผ ์จ์คŒ

dag ์ž‘์„ฑ ์˜ˆ์‹œ

    from airflow import DAG
    from airflow.operators.dummy_operatorimport DummyOperator
    from airflow.operators.python_operatorimport PythonOperator
	
	def python_task1():
		...
	
	def python_task2():
		...	    

    dag= DAG(โ€˜{dag_filename}', description='Simple tutorial DAG', schedule_interval='0 12 * * *', start_date=datetime(2019, 1, 20), catchup=False)
    
	dummy_op1 = DummyOperator(task_id='dummy_task', retries=3, dag=dag)
	
	python_op1 = PythonOperator(task_id='python_task1', python_callable={function_name}, dag=dag)
	python_op2 = PythonOperator(task_id='python_task2', python_callable={function_name}, dag=dag)
	
	dummy_op1 >> [hello_op1, hello_op2]

3.2. Execution_date

3.3. Commands


4. Screen Shots

4.1. UI ๊ตฌ์„ฑ

4.1.1. DAGs View

DAGs View

4.1.2. Tree View

Tree View

4.1.3. Graph View

Graph View

4.1.4. Gantt Chart

Gantt Chart

4.1.5. Task Duration

enter image description here

4.1.6. Code View

enter image description here

4.1.7. Task Instance Context Menu

enter image description here