SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Building (Better) Data Pipelines
using Apache Airflow
Sid Anand (@r39132)
QCon.AI 2018
1
About Me
2
Work [ed | s] @
Maintainer of
Spare time
Co-Chair for
Apache Airflow
3
What is it?
4
Apache Airflow : What is it?
In a :
Airflow is a platform to
programmatically author, schedule
and monitor workflows (a.k.a. DAGs
or Directed Acyclic Graphs)
Apache Airflow
5
UI Walk-Through
6
Apache Airflow : UI Walk-through
Airflow - Authoring DAGs
7
Airflow: Visualizing a DAG
8
Airflow: Author DAGs in Python! No need to bundle many XML files!
Airflow - Authoring DAGs
9
Airflow: The Tree View offers a view of DAG Runs over time!
Airflow - Authoring DAGs
Airflow - Performance Insights
10
Airflow: Gantt charts reveal the slowest tasks for a run!
11
Airflow: …And we can easily see performance trends over time
Airflow - Performance Insights
Apache Airflow
12
Why use it?
13
Apache Airflow : Why use it?
When would you use a Workflow Scheduler like
Airflow?
• ETL Pipelines
• Machine Learning Pipelines
• Predictive Data Pipelines
• Fraud Detection, Scoring/Ranking, Classification,
Recommender System, etc…
• General Job Scheduling (e.g. Cron)
• DB Back-ups, Scheduled code/config deployment
14
What should a Workflow Scheduler do well?
• Schedule a graph of dependencies
• where Workflow = A DAG of Tasks
• Handle task failures
• Report / Alert on failures
• Monitor performance of tasks over time
• Enforce SLAs
• E.g. Alerting if time or correctness SLAs are not met
• Easily scale for growing load
Apache Airflow : Why use it?
15
What Does Apache Airflow Add?
• Configuration-as-code
• Usability - Stunning UI / UX
• Centralized configuration
• Resource Pooling
• Extensibility
Apache Airflow : Why use it?
Use-Case : Message
Scoring
Batch Pipeline Architecture
16
Use-Case : Message Scoring
17
enterprise A
enterprise B
enterprise C
S3
S3 uploads every 15
minutes
Use-Case : Message Scoring
18
enterprise A
enterprise B
enterprise C
S3
Airflow kicks of a Spark
message scoring job
every hour
Use-Case : Message Scoring
19
enterprise A
enterprise B
enterprise C
S3
Spark job writes scored
messages and stats to
another S3 bucket
S3
Use-Case : Message Scoring
20
enterprise A
enterprise B
enterprise C
S3
This triggers SNS/SQS
messages events
S3
SNS
SQS
Use-Case : Message Scoring
21
enterprise A
enterprise B
enterprise C
S3
An Autoscale Group
(ASG) of Importers spins
up when it detects SQS
messages
S3
SNS
SQS
Importers
ASG
22
enterprise A
enterprise B
enterprise C
S3
The importers rapidly ingest scored
messages and aggregate statistics into
the DB
S3
SNS
SQS
Importers
ASG
DB
Use-Case : Message Scoring
23
enterprise A
enterprise B
enterprise C
S3
Users receive alerts of
untrusted emails &
can review them in
the web app
S3
SNS
SQS
Importers
ASG
DB
Use-Case : Message Scoring
24
enterprise A
enterprise B
enterprise C
S3 S3
SNS
SQS
Importers
ASG
DB
Airflow manages the entire process
Use-Case : Message Scoring
25
Airflow DAG
Apache Airflow
26
Incubating
27
Apache Airflow : Incubating
Timeline
• Airflow was created @ Airbnb in 2015 by Maxime
Beauchemin
• Max launched it @ Hadoop Summit in Summer 2015
• On 3/31/2016, Airflow —> Apache Incubator
Today
• 2400+ Forks
• 7600+ GitHub Stars
• 430+ Contributors
• 150+ companies officially using it!
• 14 Committers/Maintainers <— We’re growing here
Thank You!
28
Apache Airflow
29
Behind the Scenes
30
Airflow is a platform to programmatically author,
schedule and monitor workflows (a.k.a. DAGs)
It ships with a
• DAG Scheduler
• Web application (UI)
• Powerful CLI
• Celery Workers!
Apache Airflow : Behind the Scenes
31
Apache Airflow : Behind the Scenes
Webserver
Scheduler
WorkerWorkerWorker
Meta DB
1. A user schedules / manages
DAGs using the Airflow UI!
2. Airflow’s webserver stores
scheduling metadata in the
metadata DB
3. The scheduler picks up new
schedules and distributes
work over Celery /
RabbitMQ
4. Airflow workers pick up
Airflow tasks over Celery
Celery / RabbitMQ
32
Webserver
Scheduler
WorkerWorkerWorker
Meta DB
1. A user schedules / manages
DAGs using the Airflow UI!
2. Airflow’s webserver stores
scheduling metadata in the
metadata DB
3. The scheduler picks up new
schedules and distributes
work over Celery /
RabbitMQ
4. Airflow workers pick up
Airflow tasks over Celery
Celery / RabbitMQ
Apache Airflow : Behind the Scenes
1. A user schedules / manages
DAGs using the Airflow UI!
2. Airflow’s webserver stores
scheduling metadata in the
metadata DB
3. The scheduler picks up new
schedules and distributes
work over Celery /
RabbitMQ
4. Airflow workers pick up
Airflow tasks over Celery
33
Webserver
Scheduler
WorkerWorkerWorker
Meta DB
Celery / RabbitMQ
Apache Airflow : Behind the Scenes
34
Webserver
Scheduler
WorkerWorkerWorker
Meta DB
1. A user schedules / manages
DAGs using the Airflow UI!
2. Airflow’s webserver stores
scheduling metadata in the
metadata DB
3. The scheduler picks up new
schedules and distributes
work over Celery /
RabbitMQ
4. Airflow workers pick up
Airflow tasks from RabbitMQ
Celery / RabbitMQ
Apache Airflow : Behind the Scenes
Thank You!
35

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow IntroductionLiangjun Jiang
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 

Was ist angesagt? (20)

Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 

Ähnlich wie Building Better Data Pipelines using Apache Airflow

Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari Sid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data PipelinesBill Liu
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic templateSampath Kumar
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?Kaxil Naik
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersSashko Stubailo
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...Amazon Web Services
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
quickguide-einnovator-4-cloudfoundry
quickguide-einnovator-4-cloudfoundryquickguide-einnovator-4-cloudfoundry
quickguide-einnovator-4-cloudfoundryjorgesimao71
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
 

Ähnlich wie Building Better Data Pipelines using Apache Airflow (20)

Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic template
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product Developers
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
quickguide-einnovator-4-cloudfoundry
quickguide-einnovator-4-cloudfoundryquickguide-einnovator-4-cloudfoundry
quickguide-einnovator-4-cloudfoundry
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 

Mehr von Sid Anand

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)Sid Anand
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Sid Anand
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudSid Anand
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4Sid Anand
 

Mehr von Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4
 

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Kürzlich hochgeladen (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Building Better Data Pipelines using Apache Airflow

  • 1. Building (Better) Data Pipelines using Apache Airflow Sid Anand (@r39132) QCon.AI 2018 1
  • 2. About Me 2 Work [ed | s] @ Maintainer of Spare time Co-Chair for
  • 4. 4 Apache Airflow : What is it? In a : Airflow is a platform to programmatically author, schedule and monitor workflows (a.k.a. DAGs or Directed Acyclic Graphs)
  • 6. 6 Apache Airflow : UI Walk-through
  • 7. Airflow - Authoring DAGs 7 Airflow: Visualizing a DAG
  • 8. 8 Airflow: Author DAGs in Python! No need to bundle many XML files! Airflow - Authoring DAGs
  • 9. 9 Airflow: The Tree View offers a view of DAG Runs over time! Airflow - Authoring DAGs
  • 10. Airflow - Performance Insights 10 Airflow: Gantt charts reveal the slowest tasks for a run!
  • 11. 11 Airflow: …And we can easily see performance trends over time Airflow - Performance Insights
  • 13. 13 Apache Airflow : Why use it? When would you use a Workflow Scheduler like Airflow? • ETL Pipelines • Machine Learning Pipelines • Predictive Data Pipelines • Fraud Detection, Scoring/Ranking, Classification, Recommender System, etc… • General Job Scheduling (e.g. Cron) • DB Back-ups, Scheduled code/config deployment
  • 14. 14 What should a Workflow Scheduler do well? • Schedule a graph of dependencies • where Workflow = A DAG of Tasks • Handle task failures • Report / Alert on failures • Monitor performance of tasks over time • Enforce SLAs • E.g. Alerting if time or correctness SLAs are not met • Easily scale for growing load Apache Airflow : Why use it?
  • 15. 15 What Does Apache Airflow Add? • Configuration-as-code • Usability - Stunning UI / UX • Centralized configuration • Resource Pooling • Extensibility Apache Airflow : Why use it?
  • 16. Use-Case : Message Scoring Batch Pipeline Architecture 16
  • 17. Use-Case : Message Scoring 17 enterprise A enterprise B enterprise C S3 S3 uploads every 15 minutes
  • 18. Use-Case : Message Scoring 18 enterprise A enterprise B enterprise C S3 Airflow kicks of a Spark message scoring job every hour
  • 19. Use-Case : Message Scoring 19 enterprise A enterprise B enterprise C S3 Spark job writes scored messages and stats to another S3 bucket S3
  • 20. Use-Case : Message Scoring 20 enterprise A enterprise B enterprise C S3 This triggers SNS/SQS messages events S3 SNS SQS
  • 21. Use-Case : Message Scoring 21 enterprise A enterprise B enterprise C S3 An Autoscale Group (ASG) of Importers spins up when it detects SQS messages S3 SNS SQS Importers ASG
  • 22. 22 enterprise A enterprise B enterprise C S3 The importers rapidly ingest scored messages and aggregate statistics into the DB S3 SNS SQS Importers ASG DB Use-Case : Message Scoring
  • 23. 23 enterprise A enterprise B enterprise C S3 Users receive alerts of untrusted emails & can review them in the web app S3 SNS SQS Importers ASG DB Use-Case : Message Scoring
  • 24. 24 enterprise A enterprise B enterprise C S3 S3 SNS SQS Importers ASG DB Airflow manages the entire process Use-Case : Message Scoring
  • 27. 27 Apache Airflow : Incubating Timeline • Airflow was created @ Airbnb in 2015 by Maxime Beauchemin • Max launched it @ Hadoop Summit in Summer 2015 • On 3/31/2016, Airflow —> Apache Incubator Today • 2400+ Forks • 7600+ GitHub Stars • 430+ Contributors • 150+ companies officially using it! • 14 Committers/Maintainers <— We’re growing here
  • 30. 30 Airflow is a platform to programmatically author, schedule and monitor workflows (a.k.a. DAGs) It ships with a • DAG Scheduler • Web application (UI) • Powerful CLI • Celery Workers! Apache Airflow : Behind the Scenes
  • 31. 31 Apache Airflow : Behind the Scenes Webserver Scheduler WorkerWorkerWorker Meta DB 1. A user schedules / manages DAGs using the Airflow UI! 2. Airflow’s webserver stores scheduling metadata in the metadata DB 3. The scheduler picks up new schedules and distributes work over Celery / RabbitMQ 4. Airflow workers pick up Airflow tasks over Celery Celery / RabbitMQ
  • 32. 32 Webserver Scheduler WorkerWorkerWorker Meta DB 1. A user schedules / manages DAGs using the Airflow UI! 2. Airflow’s webserver stores scheduling metadata in the metadata DB 3. The scheduler picks up new schedules and distributes work over Celery / RabbitMQ 4. Airflow workers pick up Airflow tasks over Celery Celery / RabbitMQ Apache Airflow : Behind the Scenes
  • 33. 1. A user schedules / manages DAGs using the Airflow UI! 2. Airflow’s webserver stores scheduling metadata in the metadata DB 3. The scheduler picks up new schedules and distributes work over Celery / RabbitMQ 4. Airflow workers pick up Airflow tasks over Celery 33 Webserver Scheduler WorkerWorkerWorker Meta DB Celery / RabbitMQ Apache Airflow : Behind the Scenes
  • 34. 34 Webserver Scheduler WorkerWorkerWorker Meta DB 1. A user schedules / manages DAGs using the Airflow UI! 2. Airflow’s webserver stores scheduling metadata in the metadata DB 3. The scheduler picks up new schedules and distributes work over Celery / RabbitMQ 4. Airflow workers pick up Airflow tasks from RabbitMQ Celery / RabbitMQ Apache Airflow : Behind the Scenes