Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Apache Airflow
Sumit Maheshwari Qubole
Bangalore Big Data Meetup @ LinkedIn
27 Aug 2016
Agenda
● Workflows
● Problem statement
● Options
● Airflow
○ Anatomy
○ Sample DAG
○ Architecture
○ Demo
● Experiences
Workflows?
A B C
A
E H
D
CB F
G
A
E H
D
CB F
G
n
Background
Qubole was looking for a complete workflow solution. We do have a simple
(sequential) workflow and a very stabl...
In House
Pro:
● Full control
● Faster bug fixing
● Prioritised Qubole related features
Cons:
● Ever growing list of featur...
Oozie
Pros:
● Used by thousands of
companies
● Web apis, java apis, cli and
html support
● Oldest among all
Oozie
Cons:
● XML
● Significant efforts in
managing - frequent
OOM
● Difficult to customise
Pinball
Pros:
● Pythonic way of defining
DAGs.
● Extensible and horizontal
scalable.
● Pinterest is already using
pinball ...
Luigi
Pros:
● Pythonic way to write DAGs
● Pretty stable
● Huge community
● Built in support for hadoop
Luigi
Cons:
● Have to schedule workflows
externally
● Minimal UI
● State persistence via files
● No inbuilt monitoring, al...
Briefly
Pros: Very small codebase to
understand and modify. Inbuilt
support for Qubole.
Cons: Too naive for production
uses
Airflow
● Python code base
● Callable events
● Trigger rules
● Xcoms
● Cool UI & Rich CLI
● Queues & Pools
● Zombie cleanu...
● The job definitions, in python code.
● A rich CLI (command line interface) to test, run, backfill, describe and clear pa...
Sample DAG
Demo
Airflow: Some facts
Small code base of size ~ 20k lines of python code.
Born at Airbnb, open sourced in June-15 and recent...
Airflow: Architecture
Airflow comes with 4 types of builtin execution modes
● Sequential
● Local
● Celery
● Mesos
And it’s...
Sequential
● Default mode
● Minimum setup - works with sqlite
as well
● Processes 1 task at a time
● Good for demoable pur...
Local Executor
● Spawned by scheduler processes
● Vertical scalable
● Production grade
● Doesn’t need broker etc
Celery Executor
Celery Executor
● Vertical and Horizontal scalable
● Can be monitored (via Flower)
● Support Pools and Queues
Key aspects considered while productionizing Airflow at Qubole
● Availability
● Reliability
● Security
● Usability
Experie...
Thank You !
gitter - @msumit
msumit@apache.org
PS: Qubole is hiring, ping me :)
Nächste SlideShare
Wird geladen in …5
×

von

Apache Airflow Slide 1 Apache Airflow Slide 2 Apache Airflow Slide 3 Apache Airflow Slide 4 Apache Airflow Slide 5 Apache Airflow Slide 6 Apache Airflow Slide 7 Apache Airflow Slide 8 Apache Airflow Slide 9 Apache Airflow Slide 10 Apache Airflow Slide 11 Apache Airflow Slide 12 Apache Airflow Slide 13 Apache Airflow Slide 14 Apache Airflow Slide 15 Apache Airflow Slide 16 Apache Airflow Slide 17 Apache Airflow Slide 18 Apache Airflow Slide 19 Apache Airflow Slide 20 Apache Airflow Slide 21 Apache Airflow Slide 22 Apache Airflow Slide 23 Apache Airflow Slide 24 Apache Airflow Slide 25
Nächste SlideShare
Airflow at WePay
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

18 Gefällt mir

Teilen

Herunterladen, um offline zu lesen

Apache Airflow

Herunterladen, um offline zu lesen

Introductory talk on Apache Airflow (Incubator) by Sumit Maheshwari at recent Bangalore Big Data Meetup.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Apache Airflow

  1. 1. Apache Airflow Sumit Maheshwari Qubole Bangalore Big Data Meetup @ LinkedIn 27 Aug 2016
  2. 2. Agenda ● Workflows ● Problem statement ● Options ● Airflow ○ Anatomy ○ Sample DAG ○ Architecture ○ Demo ● Experiences
  3. 3. Workflows? A B C
  4. 4. A E H D CB F G
  5. 5. A E H D CB F G n
  6. 6. Background Qubole was looking for a complete workflow solution. We do have a simple (sequential) workflow and a very stable scheduler in-house already. Options were: 1. Extend in-house workflow to full-fledged workflow 2. Oozie 3. Pinball 4. Luigi 5. Briefly 6. Airflow
  7. 7. In House Pro: ● Full control ● Faster bug fixing ● Prioritised Qubole related features Cons: ● Ever growing list of features ● Much longer dev & qa cycles ● Difficult to keep pace with latest trends
  8. 8. Oozie Pros: ● Used by thousands of companies ● Web apis, java apis, cli and html support ● Oldest among all
  9. 9. Oozie Cons: ● XML ● Significant efforts in managing - frequent OOM ● Difficult to customise
  10. 10. Pinball Pros: ● Pythonic way of defining DAGs. ● Extensible and horizontal scalable. ● Pinterest is already using pinball to submit commands to Qubole. Cons: ● Complex in understanding ● “pip install” was broken. ● Lack of community interest.
  11. 11. Luigi Pros: ● Pythonic way to write DAGs ● Pretty stable ● Huge community ● Built in support for hadoop
  12. 12. Luigi Cons: ● Have to schedule workflows externally ● Minimal UI ● State persistence via files ● No inbuilt monitoring, alerting
  13. 13. Briefly Pros: Very small codebase to understand and modify. Inbuilt support for Qubole. Cons: Too naive for production uses
  14. 14. Airflow ● Python code base ● Callable events ● Trigger rules ● Xcoms ● Cool UI & Rich CLI ● Queues & Pools ● Zombie cleanup ● Growing community
  15. 15. ● The job definitions, in python code. ● A rich CLI (command line interface) to test, run, backfill, describe and clear parts of your DAGs. ● A web application, to explore your DAGs definition, their dependencies, progress, metadata and logs. ● A metadata repository that Airflow uses to keep track of task job statuses and other persistent information. ● An array of workers, running the jobs task instances in a distributed fashion. ● Scheduler processes, that fire up the task instances that are ready to run. Anatomy
  16. 16. Sample DAG
  17. 17. Demo
  18. 18. Airflow: Some facts Small code base of size ~ 20k lines of python code. Born at Airbnb, open sourced in June-15 and recently moved to Apache incubator Under active development, some numbers: a. ~1.5yr old project, 3400 commits, 177 contributors, around 20+ commits per week b. Companies using airflow: Airbnb, Agari, Lyft, Wepay, Easytaxi, Qubole and many others c. 1000+ closed PRs
  19. 19. Airflow: Architecture Airflow comes with 4 types of builtin execution modes ● Sequential ● Local ● Celery ● Mesos And it’s very easy to add your own execution mode as well
  20. 20. Sequential ● Default mode ● Minimum setup - works with sqlite as well ● Processes 1 task at a time ● Good for demoable purposes only
  21. 21. Local Executor ● Spawned by scheduler processes ● Vertical scalable ● Production grade ● Doesn’t need broker etc
  22. 22. Celery Executor
  23. 23. Celery Executor ● Vertical and Horizontal scalable ● Can be monitored (via Flower) ● Support Pools and Queues
  24. 24. Key aspects considered while productionizing Airflow at Qubole ● Availability ● Reliability ● Security ● Usability Experiences
  25. 25. Thank You ! gitter - @msumit msumit@apache.org PS: Qubole is hiring, ping me :)
  • MiglLevinskait

    Feb. 18, 2021
  • SrikanthJallapuram

    Aug. 11, 2020
  • NarayanaReddysumanth

    Dec. 2, 2019
  • RahimKhan18

    Sep. 2, 2019
  • EswarReddyS

    May. 7, 2019
  • SquareHuang

    Mar. 11, 2019
  • DanielMartinez546

    Aug. 29, 2018
  • SunetroBanerjee1

    Jul. 17, 2018
  • NareshEdla1

    Jan. 13, 2018
  • haejuk99

    Dec. 13, 2017
  • rorybramwell

    Oct. 20, 2017
  • oeegee

    Sep. 18, 2017
  • codyaray

    Sep. 13, 2017
  • hypermin

    Sep. 5, 2017
  • mwoigt

    Aug. 1, 2017
  • LoganSpangler

    Jan. 11, 2017
  • edhsu14

    Nov. 9, 2016
  • bokaromintu

    Aug. 27, 2016

Introductory talk on Apache Airflow (Incubator) by Sumit Maheshwari at recent Bangalore Big Data Meetup.

Aufrufe

Aufrufe insgesamt

10.750

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

258

Befehle

Downloads

476

Geteilt

0

Kommentare

0

Likes

18

×