SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Oozie: Towards a Scalable Workflow
 Management System for Hadoop

                     Mohammad Islam
                                 And
                        Virag Kothari
Accepted Paper
• Workshop in ACM/SIGMOD, May 2012.
• It is a team effort!

Mohammad Islam       Angelo Huang
Mohamed Battisha     Michelle Chiang
SanthoshSrinivasan   Craig Peters

Andreas Neumann      Alejandro Abdelnur
Presentation Workflow


Oozie                   Design         Result
Tutorial                Decision       s
                        s


                      Question
                      s?


           Address               END
           Question
Installing Oozie

Step 1: Download the Oozietarball
curl -O http://mirrors.sonic.net/apache/incubator/oozie/oozie-3.1.3-
incubating/oozie-3.1.3-incubating-distro.tar.gz

Step 2: Unpack the tarball
tar –xzvf<PATH_TO_OOZIE_TAR>

Step 3: Run the setup script
bin/oozie-setup.sh -hadoop 0.20.200 ${HADOOP_HOME} -extjs /tmp/ext-2.2.zip

Step 4: Start oozie
bin/oozie-start.sh

Step 5: Check status of oozie
bin/oozie admin -oozie http://localhost:11000/oozie -status
Running an Example

•Standalone Map-Reduce job
    $ hadoop jar /usr/joe/hadoop-examples.jarorg.myorg.wordcountinputDiroutputDir


•     Using Oozie


                       MapReduce    OK                           <workflow –app name =..>
        Start                                End                 <start..>
                       wordcount
                                                                 <action>
                                                                 <map-reduce>
                            ERROR                                    ……
                                                                     ……
                                                                 </workflow>
                           Kill


                  Example DAG                                    Workflow.xml
Example Workflow
<action name=’wordcount'>
<map-reduce>
<configuration>
<property>
<name>mapred.mapper.class</name><value>org.myorg.WordCount.Map</value> mapred.mapper.class =
</property>                                                              org.myorg.WordCount.Map
     <property>
<name>mapred.reducer.class</name><value>org.myorg.WordCount.Reduce</value>
</property>
<property>
                                                                         mapred.reducer.class =
<name>mapred.input.dir</name>                                            org.myorg.WordCount.Reduce
<value>usr/joe/inputDir</value>
</property>
<property>
                                                                         mapred.input.dir = inputDir
<name>mapred.output.dir</name>
<value>/usr/joe/outputDir</value>
</property>
</configuration>                                                         mapred.output.dir = outputDir
</map-reduce>
</action>
A Workflow Application
Three components required for a Workflow:

1) Workflow.xml:
   Contains job definition


2) Libraries:
   optional ‘lib/’ directory contains .jar/.so files

3) Properties file:
• Parameterization of Workflow xml
• Mandatory property is oozie.wf.application.path
Workflow Submission
Run Workflow Job

  $ oozie job –run -configjob.properties-oozie http://localhost:11000/oozie/
  Workflow ID: 00123-123456-oozie-wrkf-W

Check Workflow Job Status

  $ oozie job –info 00123-123456-oozie-wrkf-W -oozie
http://localhost:11000/oozie/
  -----------------------------------------------------------------------
  Workflow Name: test-wf
  App Path: hdfs://localhost:11000/user/your_id/oozie/
  Workflow job status [RUNNING]
   ...
  ------------------------------------------------------------------------
Key Features and Design
Decisions
• Multi-tenant
• Security
  – Authenticate every request
  – Pass appropriate token to Hadoop job
• Scalability
  – Vertical: Add extra memory/disk
  – Horizontal: Add machines
Oozie Job Processing

         Oozie Security




                                                     Hadoop
                                 Access
                                 Secure
       Job                                Kerberos
                   OozieServer

End
user
Oozie-Hadoop Security

           Oozie Security




                                                         Hadoop
                                     Access
                                     Secure
     Job                                      Kerberos
                      Oozie Server


End user




                                                c
Oozie-Hadoop Security

 •   Oozie is a multi-tenant system
 •   Job can be scheduled to run later
 •   Oozie submits/maintains the hadoop jobs
 •   Hadoop needs security token for each
     request

Question: Who should provide the security
token to hadoop and how?
Oozie-Hadoop Security Contd.

• Answer: Oozie
• How?
  – Hadoop considers Oozieas a super-user
  – Hadoopdoes not check end-user credential
  – Hadooponly checks the credential of
    Oozieprocess


• BUT hadoop job is executed as end-user.
•Oozie utilizes doAs() functionality of Hadoop.
User-Oozie Security

           Oozie Security




                                                         Hadoop
                                     Access
                                     Secure
      Job                                     Kerberos
                      Oozie Server


End user
             c
Why Oozie Security?

• One user should not modify another user’s
  job
• Hadoop doesn’t authenticate end–user
• Ooziehas to verifyits user before passing
  the job to Hadoop
How does Oozie Support Security?

• Built-in authentication
  – Kerberos
  – Non-secured (default)
• Design Decision
  – Pluggable authentication
  – Easy to include new type of authentication
  – Yahoo supports 3 types of authentication.
Job Submission to Hadoop

• Oozie is designed to handle thousands of
  jobs at the same time

• Question : Should Oozie server
  – Submit the hadoop job directly?
  – Wait for it to finish?


 • Answer: No
Job Submission Contd.
• Reason
  – Resource constraints: A single Oozie process
    can’t simultaneously create thousands of thread
    for each hadoop job. (Scaling limitation)
  – Isolation: Running user code on Oozie server
    might de-stabilize Oozie
• Design Decision
  – Create a launcher hadoop job
  – Execute the actual user job from the launcher.
  – Wait asynchronously for the job to finish.
Job Submission to Hadoop


                    Hadoop Cluster
             5     Job
                                 Actual
                 Tracker
                                M/R Job
Oozie                       3
Server       1                  4
                 Launcher
         2        Mapper
Job Submission Contd.

• Advantages
  – Horizontal scalability: If load increases, add
    machines into Hadoop cluster
  – Stability: Isolation of user code and system
    process
• Disadvantages
  – Extra map-slot is occupied by each job.
Production Setup

• Total number of nodes: 42K+
• Total number of Clusters: 25+
• Total number of processed jobs ≈ 750K/month
• Data presented from two clusters
• Each of them have nearly 4K nodes
• Total number of users /cluster = 50
Oozie Usage Pattern @ Y!
                    Distribution of Job Types On Production Clusters
               50

               45

               40

               35
  Percentage




               30

               25

               20                                                      #1 Cluster
               15                                                      #2 Cluster
               10

                5

                0

                         fs          java         map-reduce     pig
                                            Job type

• Pig and Java are the most popular
• Number of pure Map-Reduce jobs are fewer
Experimental Setup

•   Number of nodes: 7
•   Number of map-slots: 28
•   4 Core, RAM: 16 GB
•   64 bit RHEL
•   Oozie Server
    – 3 GB RAM
    – Internal Queue size = 10 K
    – # Worker Thread = 300
Job Acceptance
                                          Workflow Acceptance Rate
  workflows Accepted/Min



                           1400
                           1200
                           1000
                           800
                           600
                           400
                           200
                             0
                                  2   6   10   14   20   40   52 100 120 200 320 640
                                           Number of Submission Threads

Observation: Oozie can accept a large number of jobs
Time Line of a Oozie Job

  User       Oozie            Job         Job
 submits   submits to     completes   completes
   Job      Hadoop        at Hadoop    at Oozie

                                             Time

   Preparation              Completion
   Overhead                 Overhead

Total Oozie Overhead = Preparation + Completion
Oozie Overhead
                                          Per Action Overhead
Overhead in millisecs




                        1800
                        1600
                        1400
                        1200
                        1000
                         800
                         600
                         400
                         200
                           0
                               1 Action       5 Actions   10 Actions    50 Actions
                                           Number of Actions/Workflow

Observation: Oozie overhead is less when multiple
actions are in the same workflow.
Oozie Futures

• Scalability
  – Hot-Hot/Load balancing service
  – Replace SQL DB with Zookeeper
• Improved Usability
• Extend the benchmarking scope
• Monitoring WS API
Take Away ..

• Oozie is
 – Easier to use
 – Scalable
 – Secure and multi-tenant
Q&A




          Mohammad K               Virag Kothari
 Islamkamrul@yahoo- virag@yahoo-inc.com
                inc.com
    http://incubator.apache.org/oozie/

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Rohit Agrawal
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtimeDataWorks Summit
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot GamesMatt Goeke
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache CamelFuseSource.com
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016Vlad Mihalcea
 
W-JAX 2011: OSGi with Apache Karaf
W-JAX 2011: OSGi with Apache KarafW-JAX 2011: OSGi with Apache Karaf
W-JAX 2011: OSGi with Apache KarafJerry Preissler
 
OSGi ecosystems compared on Apache Karaf - Christian Schneider
OSGi ecosystems compared on Apache Karaf - Christian SchneiderOSGi ecosystems compared on Apache Karaf - Christian Schneider
OSGi ecosystems compared on Apache Karaf - Christian Schneidermfrancis
 
Rails 2.0 Presentation
Rails 2.0 PresentationRails 2.0 Presentation
Rails 2.0 PresentationScott Chacon
 
Parallel batch processing with spring batch slideshare
Parallel batch processing with spring batch   slideshareParallel batch processing with spring batch   slideshare
Parallel batch processing with spring batch slideshareMorten Andersen-Gott
 
A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)Nati Shalom
 

Was ist angesagt? (20)

Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtime
 
Oozie at Yahoo
Oozie at YahooOozie at Yahoo
Oozie at Yahoo
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016
 
W-JAX 2011: OSGi with Apache Karaf
W-JAX 2011: OSGi with Apache KarafW-JAX 2011: OSGi with Apache Karaf
W-JAX 2011: OSGi with Apache Karaf
 
OSGi ecosystems compared on Apache Karaf - Christian Schneider
OSGi ecosystems compared on Apache Karaf - Christian SchneiderOSGi ecosystems compared on Apache Karaf - Christian Schneider
OSGi ecosystems compared on Apache Karaf - Christian Schneider
 
Rails 2.0 Presentation
Rails 2.0 PresentationRails 2.0 Presentation
Rails 2.0 Presentation
 
Parallel batch processing with spring batch slideshare
Parallel batch processing with spring batch   slideshareParallel batch processing with spring batch   slideshare
Parallel batch processing with spring batch slideshare
 
A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)
 

Ähnlich wie May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop

Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudSoam Acharya
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdfwwww63
 
Oozie Hug May 2011
Oozie Hug May 2011Oozie Hug May 2011
Oozie Hug May 2011mislam77
 
Oozie hugnov11
Oozie hugnov11Oozie hugnov11
Oozie hugnov11mislam77
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleBig Data Joe™ Rossi
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackke4qqq
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleData Con LA
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackke4qqq
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_startGim GyungJin
 
Gofer 200707
Gofer 200707Gofer 200707
Gofer 200707oscon2007
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"Daniel Bryant
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809Tim Bunce
 
A tour of Ansible
A tour of AnsibleA tour of Ansible
A tour of AnsibleDevOps Ltd.
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Simon McCartney
 

Ähnlich wie May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop (20)

Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
 
Oozie Hug May 2011
Oozie Hug May 2011Oozie Hug May 2011
Oozie Hug May 2011
 
Oozie hugnov11
Oozie hugnov11Oozie hugnov11
Oozie hugnov11
 
Nov 2011 HUG: Oozie
Nov 2011 HUG: Oozie Nov 2011 HUG: Oozie
Nov 2011 HUG: Oozie
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
Os Bunce
Os BunceOs Bunce
Os Bunce
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_start
 
Gofer 200707
Gofer 200707Gofer 200707
Gofer 200707
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809
 
A tour of Ansible
A tour of AnsibleA tour of Ansible
A tour of Ansible
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013
 

Mehr von Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mehr von Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop

  • 1. Oozie: Towards a Scalable Workflow Management System for Hadoop Mohammad Islam And Virag Kothari
  • 2. Accepted Paper • Workshop in ACM/SIGMOD, May 2012. • It is a team effort! Mohammad Islam Angelo Huang Mohamed Battisha Michelle Chiang SanthoshSrinivasan Craig Peters Andreas Neumann Alejandro Abdelnur
  • 3. Presentation Workflow Oozie Design Result Tutorial Decision s s Question s? Address END Question
  • 4. Installing Oozie Step 1: Download the Oozietarball curl -O http://mirrors.sonic.net/apache/incubator/oozie/oozie-3.1.3- incubating/oozie-3.1.3-incubating-distro.tar.gz Step 2: Unpack the tarball tar –xzvf<PATH_TO_OOZIE_TAR> Step 3: Run the setup script bin/oozie-setup.sh -hadoop 0.20.200 ${HADOOP_HOME} -extjs /tmp/ext-2.2.zip Step 4: Start oozie bin/oozie-start.sh Step 5: Check status of oozie bin/oozie admin -oozie http://localhost:11000/oozie -status
  • 5. Running an Example •Standalone Map-Reduce job $ hadoop jar /usr/joe/hadoop-examples.jarorg.myorg.wordcountinputDiroutputDir • Using Oozie MapReduce OK <workflow –app name =..> Start End <start..> wordcount <action> <map-reduce> ERROR …… …… </workflow> Kill Example DAG Workflow.xml
  • 6. Example Workflow <action name=’wordcount'> <map-reduce> <configuration> <property> <name>mapred.mapper.class</name><value>org.myorg.WordCount.Map</value> mapred.mapper.class = </property> org.myorg.WordCount.Map <property> <name>mapred.reducer.class</name><value>org.myorg.WordCount.Reduce</value> </property> <property> mapred.reducer.class = <name>mapred.input.dir</name> org.myorg.WordCount.Reduce <value>usr/joe/inputDir</value> </property> <property> mapred.input.dir = inputDir <name>mapred.output.dir</name> <value>/usr/joe/outputDir</value> </property> </configuration> mapred.output.dir = outputDir </map-reduce> </action>
  • 7. A Workflow Application Three components required for a Workflow: 1) Workflow.xml: Contains job definition 2) Libraries: optional ‘lib/’ directory contains .jar/.so files 3) Properties file: • Parameterization of Workflow xml • Mandatory property is oozie.wf.application.path
  • 8. Workflow Submission Run Workflow Job $ oozie job –run -configjob.properties-oozie http://localhost:11000/oozie/ Workflow ID: 00123-123456-oozie-wrkf-W Check Workflow Job Status $ oozie job –info 00123-123456-oozie-wrkf-W -oozie http://localhost:11000/oozie/ ----------------------------------------------------------------------- Workflow Name: test-wf App Path: hdfs://localhost:11000/user/your_id/oozie/ Workflow job status [RUNNING] ... ------------------------------------------------------------------------
  • 9. Key Features and Design Decisions • Multi-tenant • Security – Authenticate every request – Pass appropriate token to Hadoop job • Scalability – Vertical: Add extra memory/disk – Horizontal: Add machines
  • 10. Oozie Job Processing Oozie Security Hadoop Access Secure Job Kerberos OozieServer End user
  • 11. Oozie-Hadoop Security Oozie Security Hadoop Access Secure Job Kerberos Oozie Server End user c
  • 12. Oozie-Hadoop Security • Oozie is a multi-tenant system • Job can be scheduled to run later • Oozie submits/maintains the hadoop jobs • Hadoop needs security token for each request Question: Who should provide the security token to hadoop and how?
  • 13. Oozie-Hadoop Security Contd. • Answer: Oozie • How? – Hadoop considers Oozieas a super-user – Hadoopdoes not check end-user credential – Hadooponly checks the credential of Oozieprocess • BUT hadoop job is executed as end-user. •Oozie utilizes doAs() functionality of Hadoop.
  • 14. User-Oozie Security Oozie Security Hadoop Access Secure Job Kerberos Oozie Server End user c
  • 15. Why Oozie Security? • One user should not modify another user’s job • Hadoop doesn’t authenticate end–user • Ooziehas to verifyits user before passing the job to Hadoop
  • 16. How does Oozie Support Security? • Built-in authentication – Kerberos – Non-secured (default) • Design Decision – Pluggable authentication – Easy to include new type of authentication – Yahoo supports 3 types of authentication.
  • 17. Job Submission to Hadoop • Oozie is designed to handle thousands of jobs at the same time • Question : Should Oozie server – Submit the hadoop job directly? – Wait for it to finish? • Answer: No
  • 18. Job Submission Contd. • Reason – Resource constraints: A single Oozie process can’t simultaneously create thousands of thread for each hadoop job. (Scaling limitation) – Isolation: Running user code on Oozie server might de-stabilize Oozie • Design Decision – Create a launcher hadoop job – Execute the actual user job from the launcher. – Wait asynchronously for the job to finish.
  • 19. Job Submission to Hadoop Hadoop Cluster 5 Job Actual Tracker M/R Job Oozie 3 Server 1 4 Launcher 2 Mapper
  • 20. Job Submission Contd. • Advantages – Horizontal scalability: If load increases, add machines into Hadoop cluster – Stability: Isolation of user code and system process • Disadvantages – Extra map-slot is occupied by each job.
  • 21. Production Setup • Total number of nodes: 42K+ • Total number of Clusters: 25+ • Total number of processed jobs ≈ 750K/month • Data presented from two clusters • Each of them have nearly 4K nodes • Total number of users /cluster = 50
  • 22. Oozie Usage Pattern @ Y! Distribution of Job Types On Production Clusters 50 45 40 35 Percentage 30 25 20 #1 Cluster 15 #2 Cluster 10 5 0 fs java map-reduce pig Job type • Pig and Java are the most popular • Number of pure Map-Reduce jobs are fewer
  • 23. Experimental Setup • Number of nodes: 7 • Number of map-slots: 28 • 4 Core, RAM: 16 GB • 64 bit RHEL • Oozie Server – 3 GB RAM – Internal Queue size = 10 K – # Worker Thread = 300
  • 24. Job Acceptance Workflow Acceptance Rate workflows Accepted/Min 1400 1200 1000 800 600 400 200 0 2 6 10 14 20 40 52 100 120 200 320 640 Number of Submission Threads Observation: Oozie can accept a large number of jobs
  • 25. Time Line of a Oozie Job User Oozie Job Job submits submits to completes completes Job Hadoop at Hadoop at Oozie Time Preparation Completion Overhead Overhead Total Oozie Overhead = Preparation + Completion
  • 26. Oozie Overhead Per Action Overhead Overhead in millisecs 1800 1600 1400 1200 1000 800 600 400 200 0 1 Action 5 Actions 10 Actions 50 Actions Number of Actions/Workflow Observation: Oozie overhead is less when multiple actions are in the same workflow.
  • 27. Oozie Futures • Scalability – Hot-Hot/Load balancing service – Replace SQL DB with Zookeeper • Improved Usability • Extend the benchmarking scope • Monitoring WS API
  • 28. Take Away .. • Oozie is – Easier to use – Scalable – Secure and multi-tenant
  • 29. Q&A Mohammad K Virag Kothari Islamkamrul@yahoo- virag@yahoo-inc.com inc.com http://incubator.apache.org/oozie/