SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Erik Bernhardsson
erikbern@spotify.com
Batchdataprocessingin
Python
Focusingmostlyonmusicdiscoveryandlargescalemachinelearning
Previouslymanagedthe“Analyticsteam”inStockholm
I’matSpotify in NYC
BtwI’mErikBernhardsson
Background
Billionsoflogmessages(severalTBs)everyday
Usageandbackendstats,debuginformation
Whatwewanttodo
AB-testing
Musicrecommendations
Monthly/daily/hourlyreporting
Businessmetricdashboards
Weexperimentalot–needquickdevelopmentcycles
Wecrunchalot of data
WhydidwebuildLuigi?
Oursecondcluster(in2009):
WelikeHadoop
Longstoryshort:)
Ourfifthcluster
Runningonejobiseasy
Lotsoflong-runningprocesseswithdependencies
Needmonitoring
Handlefailures
Gofromexperimentationtoproductioneasily
Butwhataboutrunning1000sofjob every day?
Butalsonon-Hadoopstuff
MostthingsarePythonMap/Reducejobs
AlsoPig,Hive
SCPfilesfromonehosttoanother
Trainamachinelearningmodel
PutdatainCassandra
Inthepre-Luigiworld
Hownottodoworkflows
“Streams”isalistof(username,track,artist,timestamp)tuples
Example:ArtistToplist
Streams
Artist
Aggregation
Top 10 Database
Pre-Luigiexampleofartisttoplists
Don’tdothisathome
OK,sochainthetasks
Cronnicer,yay!
That’sOK,butdon’tleavebrokendatasomewhere
(btw,LuigigivesyouatomicfileoperationslocallyandinHDFS)
Errorswilloccur
Thesecondstepfails,youfixit,thenyouwanttoresume
Don’trunthingstwice
Tousedataflowsascommandlinetools
Parametrizetasks
Youwanttorunthedataflowforasetofsimilarinputs
Puttasksinloops
Plumbingsucks
Graphalgorithmsrock!
Plumbingsucks...
Who’stheworld’ssecond
mostfamousplumber?
Hint:hewearsgreen
APythonframeworkfordataflowdefinitionandexecution
IntroducingLuigi
OnsteroidsandPCP
...withatoolboxofmainlyHadooprelatedstuff
Simpledependencydefinitions
EmphasisonHadoop/HDFSintegration
Atomicfileoperations
Dataflowvisualization
Commandlineintegration
Mainfeatures
Luigiis“kindoflike
Makefile”inPython
LuigiTask
Luigi-AggregateArtists
Luigi-AggregateArtists
Run on the command line:
$ python dataflow.py AggregateArtists
DEBUG: Checking if AggregateArtists() is complete
INFO: Scheduled AggregateArtists()
DEBUG: Checking if Streams() is complete
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 74375] Running AggregateArtists()
INFO: [pid 74375] Done AggregateArtists()
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
Top10artists-WrappedarbitraryPythoncode
Completingthetoplist
BasicfunctionalityforexportingtoPostgres.Cassandrasupportisintheworks
Databasesupport
Runningitall...
DEBUG: Checking if ArtistToplistToDatabase() is complete
INFO: Scheduled ArtistToplistToDatabase()
DEBUG: Checking if Top10Artists() is complete
INFO: Scheduled Top10Artists()
DEBUG: Checking if AggregateArtists() is complete
INFO: Scheduled AggregateArtists()
DEBUG: Checking if Streams() is complete
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 3
INFO: [pid 74811] Running AggregateArtists()
INFO: [pid 74811] Done AggregateArtists()
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 74811] Running Top10Artists()
INFO: [pid 74811] Done Top10Artists()
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 74811] Running ArtistToplistToDatabase()
INFO: Done writing, importing at 2013-03-13 15:41:09.407138
INFO: [pid 74811] Done ArtistToplistToDatabase()
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
Imaginehowcoolthiswouldbewithrealdata...
Theresults
Taskshaveimplicit__init__
TaskParameters
Generatescommandlineinterfacewithtypinganddocumentation
Classvariableswithsomemagic
$ python dataflow.py AggregateArtists --date 2013-03-05
Combinedusageexample
TaskParameters
RunningHadoopMapReduceutilizingHadoopStreamingorcustomjar-files
RunningHiveand(soon)Pigqueries
InsertingdatasetsintoPostgres
LuigicomeswithatoolboxofabstractTasksfor...
...howtorunanything,really
Tasktemplatesandtargets
Writingnew onesareaseasyasdefininganinterfaceand
implementingrun()
Built-inHadoopStreamingPythonframework
HadoopMapReduce
Tinyinterface–justimplementmapperandreducer
FetcheserrorlogsfromHadoopclusteranddisplaysthemtotheuser
ClassinstancevariablescanbereferencedinMapReducecode,whichmakesit
easytosupplyextradataindictionariesetc.formapsidejoins
EasytosendalongPythonmodulesthatmightnotbeinstalledonthecluster
Supportforcounters,secondarysort,combiners,distributedcache,etc.
RunsonCPythonsoyoucanuseyourfavoritelibs(numpy,pandasetc.)
Features
Built-inHadoopStreamingPythonframework
HadoopMapReduce
Morefeatures
Luigi’s“visualiser”
Diveintoanytask
Basicmulti-processing
Multipleworkers
$ python dataflow.py --workers 3 AggregateArtists --date_interval 2013-W08
Greatforautomatedexecution
Errornotifications
Preventstwoidenticaltasksfromrunningsimultaneously
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A C
F
Luigi central planner
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
Largedataflows
(Screenshotfromwebinterface)
ThingsLuigiisnot
Yes,youcanrunPythonHadoopjobsinLuigi. Butthemainfocusisworkflow
management.
Luigiisnottryingto
replacemrjob
Youstillneedtofigureouthoweachtaskruns
Luigidoesnotgiveyou
scalability
Mapreduce/Pig/Hive/etcarewonderfultoolsfordoingthisandLuigiismorethan
happytodelegateittothem.
Luigidoesnothelpyou
transformthedata
AlthoughOozieiskindofannoying
...butit’ssortoflikeOozie
Oozie Luigi
Only Hadoop Yes!
Horrible XML Yes!
Easy Yes!
Fun & powerful Yes!
“Oozieexample”
<workflow-app xmlns='uri:oozie:workflow:0.1' name='processDir'>
<start to='getDirInfo' />
<!-- STEP ONE -->
<action name='getDirInfo'>
<!--writes 2 properties: dir.num-files: returns -1 if dir doesn't exist,
otherwise returns # of files in dir dir.age: returns -1 if dir doesn't exist,
otherwise returns age of dir in days -->
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.navteq.oozie.GetDirInfo</main-class>
<arg>${inputDir}</arg>
<capture-output />
</java>
<ok to="makeIngestDecision" />
<error to="fail" />
</action>
<!-- STEP TWO -->
<decision name="makeIngestDecision">
<switch>
<!-- empty or doesn't exist -->
<case to="end">
${wf:actionData('getDirInfo')['dir.num-files'] lt 0 ||
(wf:actionData('getDirInfo')['dir.age'] lt 1 and
wf:actionData('getDirInfo')['dir.num-files'] lt 24)}
</case>
<!-- # of files >= 24 -->
<case to="ingest">
${wf:actionData('getDirInfo')['dir.num-files'] gt 23 ||
wf:actionData('getDirInfo')['dir.age'] gt 6}
</case>
<default to="sendEmail"/>
</switch>
</decision>
<!--EMAIL-->
<action name="sendEmail">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.navteq.oozie.StandaloneMailer</main-class>
<arg>probedata2@navteq.com</arg>
<arg>gregory.titievsky@navteq.com</arg>
<arg>${inputDir}</arg>
<arg>${wf:actionData('getDirInfo')['dir.num-files']}</arg>
<arg>${wf:actionData('getDirInfo')['dir.age']}</arg>
Instead,focusonridiculouslylittleboilerplatecode
Generalsoyoucanbuildwhateverontopofit
Aswellasrapidexperimentationcycle
Oncethingswork,trivialtoputinproduction
Luigidoesnothave999
features
WhatweuseLuigifor
HadoopStreaming
JavaHadoopMapReduce
Hive
Pig
Trainmachinelearningmodels
Import/exportdatato/fromPostgres
InsertdataintoCassandra
scp/rsync/ftpdatafilesandreports
Dumpandloaddatabases
OthersusingitwithScalaMapReduceandMRJobaswell
Beoneofthecoolkids!
OriginatedatSpotify
MainlybuiltbymeandEliasFreider
Basedonmanyyearsofexperiencewithdataprocessing
OpensourcesinceSeptember2012
https://github.com/spotify/luigi
Luigiisopensource
•Pig
•EC2
•Scalding
•Cassandra
Futureplans!
Formoreinformationfeelfreetoreachoutat
http://github.com/spotify/luigi
Thankyou!
Oh,andwe’rehiring–http://spotify.com/jobs
Erik Bernhardsson
erikbern@spotify.com

Weitere ähnliche Inhalte

Was ist angesagt?

H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaSri Ambati
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonYahoo Developer Network
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 
Web Scraping in Python with Scrapy
Web Scraping in Python with ScrapyWeb Scraping in Python with Scrapy
Web Scraping in Python with Scrapyorangain
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridMatt Williams
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Kyunghoon Kim
 
Migrations With Transmogrifier
Migrations With TransmogrifierMigrations With Transmogrifier
Migrations With TransmogrifierRok Garbas
 
Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Yahoo Developer Network
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitHenry Schreiner
 
Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudgfodor
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Theming Plone with Deliverance
Theming Plone with DeliveranceTheming Plone with Deliverance
Theming Plone with DeliveranceRok Garbas
 
Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...jasonscheirer
 
Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Provectus
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Viach Kakovskyi
 
XPath for web scraping
XPath for web scrapingXPath for web scraping
XPath for web scrapingScrapinghub
 

Was ist angesagt? (20)

H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Web Scraping in Python with Scrapy
Web Scraping in Python with ScrapyWeb Scraping in Python with Scrapy
Web Scraping in Python with Scrapy
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing grid
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2
 
Migrations With Transmogrifier
Migrations With TransmogrifierMigrations With Transmogrifier
Migrations With Transmogrifier
 
Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFit
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 
Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloud
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Theming Plone with Deliverance
Theming Plone with DeliveranceTheming Plone with Deliverance
Theming Plone with Deliverance
 
Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...
 
Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
 
XPath for web scraping
XPath for web scrapingXPath for web scraping
XPath for web scraping
 

Andere mochten auch

Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
Las tics en la educacion 1234
Las tics en la educacion 1234Las tics en la educacion 1234
Las tics en la educacion 1234marreju99
 
Luigi Paris.py meetup presentation
Luigi Paris.py meetup presentationLuigi Paris.py meetup presentation
Luigi Paris.py meetup presentationJonàs Bru Monserrat
 
The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009Brian Whitman
 
Cut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotCut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotBrian Whitman
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)Sophia Yeiji Shin
 
Music data is scary, beautiful and exciting
Music data is scary, beautiful and excitingMusic data is scary, beautiful and exciting
Music data is scary, beautiful and excitingBrian Whitman
 
The future music platform
The future music platformThe future music platform
The future music platformBrian Whitman
 
The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009Brian Whitman
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012Paul Lamere
 
Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInRussell Jurney
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 

Andere mochten auch (20)

Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Las tics en la educacion 1234
Las tics en la educacion 1234Las tics en la educacion 1234
Las tics en la educacion 1234
 
Luigi Paris.py meetup presentation
Luigi Paris.py meetup presentationLuigi Paris.py meetup presentation
Luigi Paris.py meetup presentation
 
The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009
 
Cut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotCut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of Dorkbot
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
 
Music data is scary, beautiful and exciting
Music data is scary, beautiful and excitingMusic data is scary, beautiful and exciting
Music data is scary, beautiful and exciting
 
The future music platform
The future music platformThe future music platform
The future music platform
 
The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012
 
Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
Quartz
QuartzQuartz
Quartz
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Azkaban
AzkabanAzkaban
Azkaban
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedIn
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 

Ähnlich wie Luigi Presentation at OSCON 2013

Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)PyData
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceRoy Cecil
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...Paris Open Source Summit
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxLex Avstreikh
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksDaniel Lemire
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013David Chen
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
TTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressTTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressDylan Jay
 

Ähnlich wie Luigi Presentation at OSCON 2013 (20)

Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops Rolls
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarks
 
Spring Batch Introduction
Spring Batch IntroductionSpring Batch Introduction
Spring Batch Introduction
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
TTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressTTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpress
 

Kürzlich hochgeladen

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Luigi Presentation at OSCON 2013