SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
July 29, 2014
Luigi
The past, the present, the future
Section name
Source:
The history
2
The long story
builder (2009-2010)
XML madness
Only used for one single project (my Master’s thesis)
3
The long story
builder2 (2010-2011)
Everything in Python, but insane amounts of boiler plate
4
Why luigi?
We wanted to do everything in Python, not XML
5
Source:
How do we use it at Spotify?
6
Blah
7
The things we got right
8
Section name
Everything is a directed acyclic graph
Makefile style
Tasks specify what they are dependent on not what other things depend on them
9
Section name
Do everything in Python
Dependencies often involve algebra hard to express in XML
10
Section name
Centralized scheduler
Overview of everything that’s currently running/scheduled
11
Luigi worker 1 Luigi worker 2
A
B C
A C
F
Luigi central planner
Section name
Trigger jobs locally is trivial
If the only way is to run things remotely, debugging is super hard
Running things locally makes it a lot easier
No messing around with paths and configuration
!
(this has a flip side – more on this later)
12
Section name
It’s a library more than a framework
Avoid the “Hollywood principle” and make it easy to customize etc
13
The hairy parts…
14
Section name
Execution is tied to scheduling
You can’t run this task “in the cloud” and go away
15
Section name
Visualization is pretty rudimentary
See how nice Driven looks for instance:
!
16
Section name
Scheduling isn’t tied to triggering
Need to rely on crontab etc
Could borrow some of the nice parts of Chronos:
17
Section name
Source:
What are some ideas for the future?
18
Section name
Separate scheduling and execution
Schedule something to run later/somewhere else
!
Recent baby step towards this is a very simple fix for running modules dynamically:
!
$ luigi --module MyModule MyTask --foo xyz --bar 123!
!
The next step would be to do something like
!
$ luigi --module MyModule MyTask --foo xyz --bar 123 --execute-
remotely !
!
A full implementation would include a bunch of command line options to probe status, kill tasks, etc
19
Section name
Separate scheduling and execution (2)
20
Luigi central scheduler
Worker
Worker
Worker
Worker
...
Section name
On-the-fly dependencies
class MyTask(luigi.Task):!
def run(self):!
input = yield OtherTask() # this could replace requires()
21
Section name
Built in crontab-replacement
@luigi.schedule!
class MyTask(luigi.Task):!
param = luigi.DateParameter(default=datetime.date.today())!
def run(self):!
…!
!
The @luigi.schedule decorator would then
1. Register that my_module.MyTask should be scheduled (by telling the central planner?)
2. Trigger it continuously from somewhere (central planner?)
22
Section name
ETA for tasks
Using a persistent task history database, you could train a simple k-NN classifier to predict how long
a task will run
Then use this with the dependency graph to predict when any task will finish
23
More features in the central planner
Kill a task
Re-launch a task
Launch a new task
24
Section name
Support for other languages
Luigi is written in Python – but the RPC is language agnostic.
25
Happy plumbing!
26
Questions?
27

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Getting to Know Airflow
Getting to Know AirflowGetting to Know Airflow
Getting to Know Airflow
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWiki
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
From Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practiceFrom Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practice
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Boosting command line experience with python and awk
Boosting command line experience with python and awkBoosting command line experience with python and awk
Boosting command line experience with python and awk
 
PlantUML
PlantUMLPlantUML
PlantUML
 
Draw More, Work Less
Draw More, Work LessDraw More, Work Less
Draw More, Work Less
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
React meets o OCalm
React meets o OCalmReact meets o OCalm
React meets o OCalm
 
H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
 
JavaScript code academy - introduction
JavaScript code academy - introductionJavaScript code academy - introduction
JavaScript code academy - introduction
 
Business logic with PostgreSQL and Python
Business logic with PostgreSQL and PythonBusiness logic with PostgreSQL and Python
Business logic with PostgreSQL and Python
 

Andere mochten auch

The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
Sophia Yeiji Shin
 

Andere mochten auch (20)

Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
 
The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009
 
Music data is scary, beautiful and exciting
Music data is scary, beautiful and excitingMusic data is scary, beautiful and exciting
Music data is scary, beautiful and exciting
 
Cut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotCut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of Dorkbot
 
The future music platform
The future music platformThe future music platform
The future music platform
 
The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012
 
Quartz
QuartzQuartz
Quartz
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Azkaban
AzkabanAzkaban
Azkaban
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedIn
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Chronos Launch
Chronos LaunchChronos Launch
Chronos Launch
 

Ähnlich wie Luigi future

bh-europe-01-clowes
bh-europe-01-clowesbh-europe-01-clowes
bh-europe-01-clowes
guest3e5046
 
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
Peter Martin
 

Ähnlich wie Luigi future (20)

Unbundling the JavaScript module bundler - DevIT
Unbundling the JavaScript module bundler - DevITUnbundling the JavaScript module bundler - DevIT
Unbundling the JavaScript module bundler - DevIT
 
Unbundling the JavaScript module bundler - Øredev 21 Nov 2018
Unbundling the JavaScript module bundler - Øredev 21 Nov 2018Unbundling the JavaScript module bundler - Øredev 21 Nov 2018
Unbundling the JavaScript module bundler - Øredev 21 Nov 2018
 
Asynchronous programming intro
Asynchronous programming introAsynchronous programming intro
Asynchronous programming intro
 
Unbundling the JavaScript module bundler - DublinJS July 2018
Unbundling the JavaScript module bundler - DublinJS July 2018Unbundling the JavaScript module bundler - DublinJS July 2018
Unbundling the JavaScript module bundler - DublinJS July 2018
 
bh-europe-01-clowes
bh-europe-01-clowesbh-europe-01-clowes
bh-europe-01-clowes
 
Developing a Joomla 3.x Component using RAD/FOF - Joomladay UK 2014
Developing a Joomla 3.x Component using RAD/FOF - Joomladay UK 2014Developing a Joomla 3.x Component using RAD/FOF - Joomladay UK 2014
Developing a Joomla 3.x Component using RAD/FOF - Joomladay UK 2014
 
Unbundling the JavaScript module bundler - Road to Coderful
Unbundling the JavaScript module bundler - Road to CoderfulUnbundling the JavaScript module bundler - Road to Coderful
Unbundling the JavaScript module bundler - Road to Coderful
 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Scheduler
 
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
Developing a Joomla 3.x Component using RAD FOF- Part 1: Back-end - Joomladay...
 
The true story_of_hello_world
The true story_of_hello_worldThe true story_of_hello_world
The true story_of_hello_world
 
Exploring Clojurescript
Exploring ClojurescriptExploring Clojurescript
Exploring Clojurescript
 
SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8
 
Mastering Python lesson 4_functions_parameters_arguments
Mastering Python lesson 4_functions_parameters_argumentsMastering Python lesson 4_functions_parameters_arguments
Mastering Python lesson 4_functions_parameters_arguments
 
iSoligorsk #3 2013
iSoligorsk #3 2013iSoligorsk #3 2013
iSoligorsk #3 2013
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Code analysis for a better future
Code analysis for a better futureCode analysis for a better future
Code analysis for a better future
 
Unbundling the JavaScript module bundler - Codemotion Rome 2018
Unbundling the JavaScript module bundler - Codemotion Rome 2018Unbundling the JavaScript module bundler - Codemotion Rome 2018
Unbundling the JavaScript module bundler - Codemotion Rome 2018
 
Unbundling the JavaScript module bundler - Luciano Mammino - Codemotion Rome ...
Unbundling the JavaScript module bundler - Luciano Mammino - Codemotion Rome ...Unbundling the JavaScript module bundler - Luciano Mammino - Codemotion Rome ...
Unbundling the JavaScript module bundler - Luciano Mammino - Codemotion Rome ...
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
Untangling spring week7
Untangling spring week7Untangling spring week7
Untangling spring week7
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Luigi future