SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Airflow
The Compele Hands-On Course
강석우 pko89403@gmail.com
Why
AIRFLOW
What is airflow ?
š 프로그램적으로 데이터 파이프라인을 author, schedule, monitor
š 컴포넌트 : Web Server , Scheduler, Executor, Worker, Metadatabase
š 키 컨셉 : DAG, Operator, Task, TaskInstance, Workflow
Airflow Architecture
š Airflow Webserver : Serves the UI Dashboard over http
š Airflow Scheduler : A daemon
š Airflow Worker : working wrapper
š Metadata Database : Stores information regarding state of tasks
š Executor : Message queuing process that is bound to the scheduler and determines the
worker processes that executes scheduled tasks
Airflow Webserver
Airflow Scheduler
Worker
Worker
Worker
Meta DB
Logs
Dags
How airflow works ?
š 1. The scheduler reads the DAG folder
š 2. Your Dag is parsed by a process to create a DagRun based on the scheduling
parameters of your DAG.
š 3. A TaskInstance is instantiated for each Task that needs to be executed and flagged to
“Scheduled” in the metadata database
š 4. The Scheduler gets all TaskInstances flagged “Scheduled” from the metadata database,
changes the state to “Queued” and sends them to the executors to be executed.
š 5. Executors pull out Tasks from the queue ( depending on your execution setup ), change
the state from “Queued” to “Running” and Workers start executing the TaskInstances.
š 6. When a Task is finished, the Executor changes the state of that task to its final
state( success, failed, etc ) in the database and the DAGRun is updated by the scheduler
with the state “Success” or “Failed”. Of course, the web server periodically fetch data
from metadatabae to update UI.
Start Airflow ( From install to UI )
š AIRFLOW_GPL_UNICODE=yes pip install “apache-airflow[celery, crypto, postgres, hive, rabbitmq, redis]”
š Airflow initdb
š Airflow upgraded
š Ls
š cd airflow
š grep dags_folder airflow.cfg
š mkdir –p /home/airflow/airflow/dags
š Ls
š Vim airflow.cfg ( Configuration File )
š Load.example ( false )
š Airflow resetdb
š Airflow scheduler
š Airflow webserver
QuickTour of Airflow
š airflow list_dags
š airflow list_tasks {dag name} –tree
š airflow test {dag name} python_task {execution date}
š airflow –h
What is DAG ?
š Finte directed graph with no directed cycles. No cycle
š Dag represents a collection of tasks to run, organized in a way that represent their
dependencies and relations
š Each node is a Task
š Each edge is Dependency
š 어떻게 워크플로우를 실행시킬건가?
DAG’s important properties
š Defined in Python files placed into Airflow’s DAG_FOLDER ( usually ~/airflow/dags)
š Dag_id
š Description
š Start_date
š Schedule_interval
š Dependent_on_past : run the next DAGRun if the Previous one completed successfully
š Default_args : constructor keyword parameter when initializing opeators
What is Operator?
š Determines what actually gets done.
š Operators are usually (but now always) atomic, meaning they can stand on their own and
don’t need to share resources with any other operators.
š Definition of single task
š Should be idempotent ( 항상 같은 결과를 출력 )
š Task is created by instantiating an Operator class
š An operator defines the nature of this task and how should it be executed
š Operator is instantiated, this task becomes a node in your DAG.
Many Operators
š Bash Operator
š Python Operator
š EmailOperator ( sends an email )
š SqlOperator ( Executes a SQL command
š All Operators inherit from BaseOperator
š 3 types of operators
š Action operators that perform action (BashOperator, PythonOperator, EmailOperator … )
š Transfer operators that move data from one system to another ( sqlOperator, sftpOperator)
š Sensor operators waiting for data to arrive at defined location.
Operator ++
š Transfer Operators
š Move data from one system to another
š Pulled out from the source, staged on the machine where the executor is running, and then transferred
to the target system.
š Don’t use if you are dealing with a large amount of data
š Sensor Operators
š Inherit of BaseSensorOperator
š They are useful for monitoring external processes like waiting for files to be uploaded in HDFS or a
partition appearing in Hive
š Basically long running task
š Sensor operator has a poke method called repeatedly until it returns True ( method used for monitoring
the external process)
Make Dependencies in python
š set_upstream()
š set_downstream()
š << ( = set_upstream )
š >> ( = set_downstream )
A
B
C
D
š B depends of A
š C depends of A
š D depends of B and C
( Example )
A.set_downstream(B)
A >> B
A >> { B, C } >> D
How the Scheduler Works
š DagRun
š A Dag consists of Tasks and need those tasks to run
š When the Scheduler parses a Dag, it automatically creates a DagRun which is an instantiation of a DAG in time according to start_date
and schedule
š Backfill and Catchup
š Scheduler Interval
š None
š @once
š @hourly
š @daily
š @weekly
š @monthly
š @yearly
š Cron time string format can be used : ( * * * * * - Minute(0-59) Hour(0-23) Day of the month(1-31) Month(1-12) Day of the week(0-7)
Concurrency vs Parallelism
š Concurrent – If it can support two or more actions in progress at the same time
š Parallel – If it can support two or more actions executing simultaneously
š In concurrent systems, multiple actions can be in progress (may not be executed) at the
same time
š In parallel systems, multiple actions are simultaneously executed
Database and Executor
š Sequential Executor ( Default executor, SQLlite )
š Default executor you get when you run Apache Airflow
š Only run one task at time (Sequential), useful for debugging
š It is the only executor that can be used with SQLite since SQLlite donesn’t support multiple writers
š Local Executor ( PostgreSQL )
š It can run multiple tasks at a time
š Multiprocessing python library and queues to parallelize the execution of tasks
š Run tasks by spawning processes in a controlled fashion in different modes on the same machine
š Can tune the number of processes to spawn by using the parallelism parameter
Database and Executor
š Celery Executor
š Celery == Python Task-Queue System
š Task-Queue System handle distribution of tasks on workers across threads or network nodes
š Tasks need to be pushed into a broker( RabbitMQ )
š celery workers will pop them and schedule task executions
š Recommend for production use of Airflow
š Allows distributing the execution of task instances to multiple worker node(Computer)
š ++ Dask, Mesos, Kubernetes … etc
Celery Executor, PostgreSQL and RabbitMQ Structure
Executor Architecture
Meta DB
Web Server
Scheduler +
Worker
Local Executor ( Single Machine )
Meta DB
Web Server Scheduler +
Worker
Worker
Worker
Celery
Celery Executor
Advanced Concept
š SubDAG
š Minimising repetitive patterns
š Main DAG mangages all the subDAGs as normal taks
š SubDAGs must be scheduled the same as their parent DAG
š Hooks
š Interfaces to interact with your external sources such as (PostgreSQL, Spark, SFTP … )
XCOM
š Tasks communicate ( cross-communication , allows multiple tasks to exchange messages )
š Principally defined by a key, value and a timestamp
š XCOMs data can be “pushed” or “pulled”
š X_com_push()
š If a task returns a value, a XCOM containing that value is automatically pushed
š X_com_pull()
š Task gets the message based on parameters such as “key”, “task_ids” and “dag_id”
š Keys that are automatically given to XCOMs when they are pushed by being returned from
Branching
š Allowing DAG to choose between different paths according to the result of a specific task
š Use BranchPythonOperator
š When using branch, do not use property depends on past+
Service Level Agreement ( SLAs )
š SLA is a contract between a service provider and the end user that defines the level of
service expected from the service provider
š Define what the end user will received ( Must be received )
š Time, relative to the execution_date of tast not the start time(more than 30 min from exec )
š Different from ‘execution_timeout’ parameter << It makes task stopped and marks failed

Weitere ähnliche Inhalte

Was ist angesagt?

Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow IntroductionLiangjun Jiang
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in ProductionRobert Sanders
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 

Was ist angesagt? (20)

Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 

Ähnlich wie Airflow tutorials hands_on

Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangaloresrikanthhadoop
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaolyvanlinh519
 
airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxVIJAYAPRABAP
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...InfluxData
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsxManKD
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeO'Reilly Media
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...InfluxData
 
Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2JollyRogers5
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptxVIJAYAPRABAP
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 

Ähnlich wie Airflow tutorials hands_on (20)

Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
GoDocker presentation
GoDocker presentationGoDocker presentation
GoDocker presentation
 
Hadoop institutes in Bangalore
Hadoop institutes in BangaloreHadoop institutes in Bangalore
Hadoop institutes in Bangalore
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
 
airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptx
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
 
hadoop.ppt
hadoop.ppthadoop.ppt
hadoop.ppt
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Hadoop 3
Hadoop 3Hadoop 3
Hadoop 3
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
 
Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2
 
G pars
G parsG pars
G pars
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptx
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 

Mehr von pko89403

Wide&Deep Recommendation Model
Wide&Deep Recommendation ModelWide&Deep Recommendation Model
Wide&Deep Recommendation Modelpko89403
 
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks pko89403
 
Improving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-TrainingImproving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-Trainingpko89403
 
CNN Introduction
CNN IntroductionCNN Introduction
CNN Introductionpko89403
 
AutoEncoder&GAN Introduction
AutoEncoder&GAN IntroductionAutoEncoder&GAN Introduction
AutoEncoder&GAN Introductionpko89403
 
Accelerating the machine learning lifecycle with m lflow
Accelerating the machine learning lifecycle with m lflowAccelerating the machine learning lifecycle with m lflow
Accelerating the machine learning lifecycle with m lflowpko89403
 
Auto rec autoencoders meets collaborative filtering
Auto rec autoencoders meets collaborative filteringAuto rec autoencoders meets collaborative filtering
Auto rec autoencoders meets collaborative filteringpko89403
 
Graph convolutional matrix completion
Graph convolutional  matrix completionGraph convolutional  matrix completion
Graph convolutional matrix completionpko89403
 
Efficient thompson sampling for online matrix factorization recommendation
Efficient thompson sampling for online matrix factorization recommendationEfficient thompson sampling for online matrix factorization recommendation
Efficient thompson sampling for online matrix factorization recommendationpko89403
 
Session based rcommendations with recurrent neural networks
Session based rcommendations with recurrent neural networksSession based rcommendations with recurrent neural networks
Session based rcommendations with recurrent neural networkspko89403
 

Mehr von pko89403 (11)

Wide&Deep Recommendation Model
Wide&Deep Recommendation ModelWide&Deep Recommendation Model
Wide&Deep Recommendation Model
 
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
DeepAR:Probabilistic Forecasting with Autogressive Recurrent Networks
 
Item2Vec
Item2VecItem2Vec
Item2Vec
 
Improving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-TrainingImproving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-Training
 
CNN Introduction
CNN IntroductionCNN Introduction
CNN Introduction
 
AutoEncoder&GAN Introduction
AutoEncoder&GAN IntroductionAutoEncoder&GAN Introduction
AutoEncoder&GAN Introduction
 
Accelerating the machine learning lifecycle with m lflow
Accelerating the machine learning lifecycle with m lflowAccelerating the machine learning lifecycle with m lflow
Accelerating the machine learning lifecycle with m lflow
 
Auto rec autoencoders meets collaborative filtering
Auto rec autoencoders meets collaborative filteringAuto rec autoencoders meets collaborative filtering
Auto rec autoencoders meets collaborative filtering
 
Graph convolutional matrix completion
Graph convolutional  matrix completionGraph convolutional  matrix completion
Graph convolutional matrix completion
 
Efficient thompson sampling for online matrix factorization recommendation
Efficient thompson sampling for online matrix factorization recommendationEfficient thompson sampling for online matrix factorization recommendation
Efficient thompson sampling for online matrix factorization recommendation
 
Session based rcommendations with recurrent neural networks
Session based rcommendations with recurrent neural networksSession based rcommendations with recurrent neural networks
Session based rcommendations with recurrent neural networks
 

Kürzlich hochgeladen

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Kürzlich hochgeladen (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Airflow tutorials hands_on

  • 1. Airflow The Compele Hands-On Course 강석우 pko89403@gmail.com
  • 2. Why
  • 4. What is airflow ? š 프로그램적으로 데이터 파이프라인을 author, schedule, monitor š 컴포넌트 : Web Server , Scheduler, Executor, Worker, Metadatabase š 키 컨셉 : DAG, Operator, Task, TaskInstance, Workflow
  • 5. Airflow Architecture š Airflow Webserver : Serves the UI Dashboard over http š Airflow Scheduler : A daemon š Airflow Worker : working wrapper š Metadata Database : Stores information regarding state of tasks š Executor : Message queuing process that is bound to the scheduler and determines the worker processes that executes scheduled tasks Airflow Webserver Airflow Scheduler Worker Worker Worker Meta DB Logs Dags
  • 6. How airflow works ? š 1. The scheduler reads the DAG folder š 2. Your Dag is parsed by a process to create a DagRun based on the scheduling parameters of your DAG. š 3. A TaskInstance is instantiated for each Task that needs to be executed and flagged to “Scheduled” in the metadata database š 4. The Scheduler gets all TaskInstances flagged “Scheduled” from the metadata database, changes the state to “Queued” and sends them to the executors to be executed. š 5. Executors pull out Tasks from the queue ( depending on your execution setup ), change the state from “Queued” to “Running” and Workers start executing the TaskInstances. š 6. When a Task is finished, the Executor changes the state of that task to its final state( success, failed, etc ) in the database and the DAGRun is updated by the scheduler with the state “Success” or “Failed”. Of course, the web server periodically fetch data from metadatabae to update UI.
  • 7. Start Airflow ( From install to UI ) š AIRFLOW_GPL_UNICODE=yes pip install “apache-airflow[celery, crypto, postgres, hive, rabbitmq, redis]” š Airflow initdb š Airflow upgraded š Ls š cd airflow š grep dags_folder airflow.cfg š mkdir –p /home/airflow/airflow/dags š Ls š Vim airflow.cfg ( Configuration File ) š Load.example ( false ) š Airflow resetdb š Airflow scheduler š Airflow webserver
  • 8. QuickTour of Airflow š airflow list_dags š airflow list_tasks {dag name} –tree š airflow test {dag name} python_task {execution date} š airflow –h
  • 9. What is DAG ? š Finte directed graph with no directed cycles. No cycle š Dag represents a collection of tasks to run, organized in a way that represent their dependencies and relations š Each node is a Task š Each edge is Dependency š 어떻게 워크플로우를 실행시킬건가?
  • 10. DAG’s important properties š Defined in Python files placed into Airflow’s DAG_FOLDER ( usually ~/airflow/dags) š Dag_id š Description š Start_date š Schedule_interval š Dependent_on_past : run the next DAGRun if the Previous one completed successfully š Default_args : constructor keyword parameter when initializing opeators
  • 11. What is Operator? š Determines what actually gets done. š Operators are usually (but now always) atomic, meaning they can stand on their own and don’t need to share resources with any other operators. š Definition of single task š Should be idempotent ( 항상 같은 결과를 출력 ) š Task is created by instantiating an Operator class š An operator defines the nature of this task and how should it be executed š Operator is instantiated, this task becomes a node in your DAG.
  • 12. Many Operators š Bash Operator š Python Operator š EmailOperator ( sends an email ) š SqlOperator ( Executes a SQL command š All Operators inherit from BaseOperator š 3 types of operators š Action operators that perform action (BashOperator, PythonOperator, EmailOperator … ) š Transfer operators that move data from one system to another ( sqlOperator, sftpOperator) š Sensor operators waiting for data to arrive at defined location.
  • 13. Operator ++ š Transfer Operators š Move data from one system to another š Pulled out from the source, staged on the machine where the executor is running, and then transferred to the target system. š Don’t use if you are dealing with a large amount of data š Sensor Operators š Inherit of BaseSensorOperator š They are useful for monitoring external processes like waiting for files to be uploaded in HDFS or a partition appearing in Hive š Basically long running task š Sensor operator has a poke method called repeatedly until it returns True ( method used for monitoring the external process)
  • 14. Make Dependencies in python š set_upstream() š set_downstream() š << ( = set_upstream ) š >> ( = set_downstream ) A B C D š B depends of A š C depends of A š D depends of B and C ( Example ) A.set_downstream(B) A >> B A >> { B, C } >> D
  • 15. How the Scheduler Works š DagRun š A Dag consists of Tasks and need those tasks to run š When the Scheduler parses a Dag, it automatically creates a DagRun which is an instantiation of a DAG in time according to start_date and schedule š Backfill and Catchup š Scheduler Interval š None š @once š @hourly š @daily š @weekly š @monthly š @yearly š Cron time string format can be used : ( * * * * * - Minute(0-59) Hour(0-23) Day of the month(1-31) Month(1-12) Day of the week(0-7)
  • 16. Concurrency vs Parallelism š Concurrent – If it can support two or more actions in progress at the same time š Parallel – If it can support two or more actions executing simultaneously š In concurrent systems, multiple actions can be in progress (may not be executed) at the same time š In parallel systems, multiple actions are simultaneously executed
  • 17. Database and Executor š Sequential Executor ( Default executor, SQLlite ) š Default executor you get when you run Apache Airflow š Only run one task at time (Sequential), useful for debugging š It is the only executor that can be used with SQLite since SQLlite donesn’t support multiple writers š Local Executor ( PostgreSQL ) š It can run multiple tasks at a time š Multiprocessing python library and queues to parallelize the execution of tasks š Run tasks by spawning processes in a controlled fashion in different modes on the same machine š Can tune the number of processes to spawn by using the parallelism parameter
  • 18. Database and Executor š Celery Executor š Celery == Python Task-Queue System š Task-Queue System handle distribution of tasks on workers across threads or network nodes š Tasks need to be pushed into a broker( RabbitMQ ) š celery workers will pop them and schedule task executions š Recommend for production use of Airflow š Allows distributing the execution of task instances to multiple worker node(Computer) š ++ Dask, Mesos, Kubernetes … etc
  • 19. Celery Executor, PostgreSQL and RabbitMQ Structure
  • 20. Executor Architecture Meta DB Web Server Scheduler + Worker Local Executor ( Single Machine ) Meta DB Web Server Scheduler + Worker Worker Worker Celery Celery Executor
  • 21. Advanced Concept š SubDAG š Minimising repetitive patterns š Main DAG mangages all the subDAGs as normal taks š SubDAGs must be scheduled the same as their parent DAG š Hooks š Interfaces to interact with your external sources such as (PostgreSQL, Spark, SFTP … )
  • 22. XCOM š Tasks communicate ( cross-communication , allows multiple tasks to exchange messages ) š Principally defined by a key, value and a timestamp š XCOMs data can be “pushed” or “pulled” š X_com_push() š If a task returns a value, a XCOM containing that value is automatically pushed š X_com_pull() š Task gets the message based on parameters such as “key”, “task_ids” and “dag_id” š Keys that are automatically given to XCOMs when they are pushed by being returned from
  • 23. Branching š Allowing DAG to choose between different paths according to the result of a specific task š Use BranchPythonOperator š When using branch, do not use property depends on past+
  • 24. Service Level Agreement ( SLAs ) š SLA is a contract between a service provider and the end user that defines the level of service expected from the service provider š Define what the end user will received ( Must be received ) š Time, relative to the execution_date of tast not the start time(more than 30 min from exec ) š Different from ‘execution_timeout’ parameter << It makes task stopped and marks failed