SlideShare ist ein Scribd-Unternehmen logo
1 von 32
An Introduction to Cloudera’s
    Hadoop Developer Training Course
    Ian Wrigley
    Curriculum Manager




1
Welcome to the Webinar!
     All lines are muted
     Q & A after the presentation
     Ask questions at any time by typing them in the
      WebEx panel
     A recording of this Webinar will be available on
      demand at cloudera.com




2
Topics
     Why Cloudera Training?
     Who Should Attend Developer Training?
     Developer Course Contents
     A Deeper Dive: The New API vs The Old API
     A Deeper Dive: Determining the Optimal Number of
      Reducers
     Conclusion




3
Cloudera’s Training is the Industry Standard
         Big Data                   Cloudera has trained
    professionals from                employees from

        55%                            100%
    of the Fortune 100              of the top 20 global
    have attended live              technology firms to
     Cloudera training                  use Hadoop
                 Cloudera has trained over

                     15,000
                         students
4
Cloudera Training: The Benefits

    1  Broadest Range of Courses
       Cover all the key Hadoop components            5  Widest Geographic Coverage
                                                         Most classes offered: 20 countries plus virtual classroom



    2 Most Experienced Instructors
       Over 15,000 students trained since 2009        6 Most Relevant Platform & Community
                                                         CDH deployed more than all other distributions combined



    3 Leader in Certification
       Over 5,000 accredited Cloudera professionals   7 Depth of Training Material
                                                         Hands-on labs and VMs support live instruction



    4 State of the Art Curriculum
       Classes updated regularly as Hadoop evolves    8 Ongoing Learning
                                                         Video tutorials and e-learning complement training




5
The professionalism and expansive
    technical knowledge of our classroom
    instructor was incredible. The quality of
    the training was on par with a university.




6
Topics
     Why Cloudera Training?
     Who Should Attend Developer Training?
     Developer Course Contents
     A Deeper Dive: The New API vs The Old API
     A Deeper Dive: Determining the Optimal Number of
      Reducers
     Conclusion




7
Common Attendee Profiles
     Software Developers/Engineers
     Business analysts
     IT managers
     Hadoop system administrators




8
Course Pre-Requisites
       Programming experience
            Knowledge of Java highly recommended
     Understanding of common computer science
      principles is helpful
     Prior knowledge of Hadoop is not required




9
Who Should Not Attend?
        If you have no programming experience, you’re likely
         to find the course very difficult
             You might consider our Hive and Pig training course instead
        If you will be focused solely on configuring and
         managing your cluster, our Administrator training
         course would probably be a better alternative




10
Topics
      Why Cloudera Training?
      Who Should Attend Developer Training?
      Developer Course Contents
      A Deeper Dive: The New API vs The Old API
      A Deeper Dive: Determining the Optimal Number of
       Reducers
      Conclusion




11
Developer Training: Overview
      The course assumes no pre-existing knowledge of
       Hadoop
      Starts by discussing the motivation for Hadoop
             What problems exist that are difficult (or impossible) to
              solve with existing systems
        Explains basic Hadoop concepts
             The Hadoop Distributed File System (HDFS)
             MapReduce
        Introduces the Hadoop API (Application Programming
         Interface)

12
Developer Training: Overview (cont’d)
        Moves on to discuss more complex Hadoop concepts
             Custom Partitioners
             Custom Writables and WritableComparables
             Custom InputFormats and OutputFormats
        Investigates common MapReduce algorithms
             Sorting, searching, indexing, joining data sets, etc.
        Then covers the Hadoop ‘ecosystem’
             Hive, Pig, Sqoop, Flume, Mahout, Oozie




13
Course Contents




14
Hands-On Exercises
        The course features many Hands-On Exercises
             Analyzing log files
             Unit-testing Hadoop code
             Writing and implementing Combiners
             Writing custom Partitioners
             Using SequenceFiles and file compression
             Creating an inverted index
             Creating custom WritableComparables
             Importing data with Sqoop
             Writing Hive queries
             …and more

15
Certification
      Our Developer course is good preparation for the
       Cloudera Certified Developer for Apache Hadoop
       (CCDH) exam
      A voucher for one attempt at the exam is currently
       included in the course fee




16
Topics
      Why Cloudera Training?
      Who Should Attend Developer Training?
      Developer Course Contents
      A Deeper Dive: The New API vs The Old API
      A Deeper Dive: Determining the Optimal Number of
       Reducers
      Conclusion




17
Chapter Topics

                                                            Basic Programming with the
 Writing a MapReduce Program
                                                            Hadoop Core API

  The MapReduce flow
  Basic MapReduce API concepts
  Writing MapReduce applications in Java
   – The driver
   – The Mapper
   – The Reducer
  Writing Mappers and Reducers in other languages with the Streaming API
  Speeding up Hadoop development by using Eclipse
  Hands-On Exercise: Writing a MapReduce Program
  Differences between the Old and New MapReduce APIs
  Conclusion

            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   18
What Is The Old API?

 When Hadoop 0.20 was released, a ‘New API’ was introduced
   –Designed to make the API easier to evolve in the future
   –Favors abstract classes over interfaces
 Some developers still use the Old API
    –Until CDH4, the New API was not absolutely feature-complete
 All the code examples in this course use the New API
    –Old API-based solutions for many of the Hands-On Exercises for this
       course are available in the sample_solutions_oldapi directory




            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   19
New API vs. Old API: Some Key Differences
New API                                                          Old API
import org.apache.hadoop.mapreduce.*                             import org.apache.hadoop.mapred.*

Driver code:                                                     Driver code:

Configuration conf = new Configuration();                        JobConf conf = new JobConf(conf,
Job job = new Job(conf);                                         Driver.class);
job.setJarByClass(Driver.class);                                 conf.setSomeProperty(...);
job.setSomeProperty(...);                                        ...
...                                                              JobClient.runJob(conf);
job.waitForCompletion(true);


Mapper:                                                          Mapper:

public class MyMapper extends Mapper {                           public class MyMapper extends
                                                                 MapReduceBase
    public void map(Keytype k, Valuetype v,                                                implements
                                  Context c)                     Mapper {
{
        ...                                                          public void map(Keytype k, Valuetype v,
        c.write(key, val);                                                    OutputCollector o, Reporter r)
    }                                                            {
}                                                                        ...
                                                                         o.collect(key, val);
                                                                     }
                                                                 }


                  © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   20
New API vs. Old API: Some Key Differences (cont’d)
New API                                                        Old API
Reducer:                                                       Reducer:

public class MyReducer extends Reducer {                       public class MyReducer extends
                                                               MapReduceBase
  public void reduce(Keytype k,                                                        implements Reducer
           Iterable<Valuetype> v, Context                      {
c) {
     for(Valuetype v : eachval) {                                   public void reduce(Keytype k,
       // process eachval                                                               Iterator<Valuetype>
       c.write(key, val);                                      v,
     }                                                                              OutputCollector o, Reporter
  }                                                            r) {
}                                                                       while(v.hasnext()) {
                                                                          // process v.next()
                                                                          o.collect(key, val);
                                                                        }
                                                                    }
                                                               }


setup(Context c) (See later)                                   configure(JobConf job)

cleanup(Context c) (See later)                                 close()




                © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   21
MRv1 vs MRv2, Old API vs New API

 There is a lot of confusion about the New and Old APIs, and MapReduce
  version 1 and MapReduce version 2
 The chart below should clarify what is available with each version of
  MapReduce

                                                   Old API                                        New API

MapReduce v1                                            ✔                                              ✔

MapReduce v2                                            ✔                                              ✔

 Summary: Code using either the Old API or the New API will run under
  MRv1 and MRv2
    –You will have to recompile the code to move from MR1 to MR2, but you
     will not have to change the code itself

            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   22
Topics
      Why Cloudera Training?
      Who Should Attend Developer Training?
      Developer Course Contents
      A Deeper Dive: The New API vs The Old API
      A Deeper Dive: Determining the Optimal Number of
       Reducers
      Conclusion




23
Chapter Topics

 Practical Development Tips                                   Basic Programming with the
 and Techniques                                               Hadoop Core API

    Strategies for debugging MapReduce code
    Testing MapReduce code locally using LocalJobRunner
    Writing and viewing log files
    Retrieving job information with Counters
    Determining the optimal number of Reducers for a job
    Reusing objects
    Creating Map-only MapReduce jobs
    Hands-On Exercise: Using Counters and a Map-Only Job
    Conclusion




              © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   24
How Many Reducers Do You Need?

 An important consideration when creating your job is to determine the
  number of Reducers specified
 Default is a single Reducer
 With a single Reducer, one task receives all keys in sorted order
   –This is sometimes advantageous if the output must be in completely
     sorted order
   –Can cause significant problems if there is a large amount of
     intermediate data
        –Node on which the Reducer is running may not have enough disk
          space to hold all intermediate data
        –The Reducer will take a long time to run




             © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   25
Jobs Which Require a Single Reducer

 If a job needs to output a file where all keys are listed in sorted order, a
  single Reducer must be used
 Alternatively, the TotalOrderPartitioner can be used
    –Uses an externally generated file which contains information about
      intermediate key distribution
    –Partitions data such that all keys which go to the first Reducer are
      smaller than any which go to the second, etc
    –In this way, multiple Reducers can be used
    –Concatenating the Reducers’ output files results in a totally ordered list




             © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   26
Jobs Which Require a Fixed Number of Reducers

 Some jobs will require a specific number of Reducers
 Example: a job must output one file per day of the week
    –Key will be the weekday
    –Seven Reducers will be specified
    –A Partitioner will be written which sends one key to each Reducer




            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   27
Jobs With a Variable Number of Reducers

 Many jobs can be run with a variable number of Reducers
 Developer must decide how many to specify
    –Each Reducer should get a reasonable amount of intermediate data, but
     not too much
    –Chicken-and-egg problem
 Typical way to determine how many Reducers to specify:
    –Test the job with a relatively small test data set
    –Extrapolate to calculate the amount of intermediate data expected from
      the ‘real’ input data
    –Use that to calculate the number of Reducers which should be specified




            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   28
Jobs With a Variable Number of Reducers (cont’d)

 Note: you should take into account the number of Reduce slots likely to be
  available on the cluster
    –If your job requires one more Reduce slot than there are available, a
      second ‘wave’ of Reducers will run
         –Consisting just of that single Reducer
         –Potentially doubling the amount of time spent on the Reduce phase
    –In this case, increasing the number of Reducers further may cut down
      the time spent in the Reduce phase
         –Two or more waves will run, but the Reducers in each wave will
           have to process less data




            © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent.   29
Topics
      Why Cloudera Training?
      Who Should Attend Developer Training?
      Developer Course Contents
      A Deeper Dive: The New API vs The Old API
      A Deeper Dive: Determining the Optimal Number of
       Reducers
      Conclusion




30
Conclusion
        Cloudera’s Developer training course is:
             Technical
             Hands-on
             Interactive
             Comprehensive
      Attendees leave the course with the skillset required
       to write, test, and run Hadoop jobs
      The course is a good preparation for the CCDH
       certification exam


31
Questions?
        For more information on Cloudera’s training
         courses, or to book a place on an upcoming course:

         http://university.cloudera.com

        My e-mail address: ian@cloudera.com

        Feel free to ask questions!
             Hit the Q&A button, and type away


32

Weitere ähnliche Inhalte

Was ist angesagt?

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersDataWorks Summit
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storageDataWorks Summit
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Aengus Rooney
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 

Was ist angesagt? (20)

Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 

Andere mochten auch

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Andere mochten auch (14)

Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Ähnlich wie Introduction to Hadoop Developer Training Webinar

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...mindscriptsseo
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkSequelGate
 
Build your operator with the right tool
Build your operator with the right toolBuild your operator with the right tool
Build your operator with the right toolRafał Leszko
 
Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Rafał Leszko
 
Meetup Devops-Geneva-19.10.2019
Meetup Devops-Geneva-19.10.2019Meetup Devops-Geneva-19.10.2019
Meetup Devops-Geneva-19.10.2019Hidora
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to ProductThe UberCloud
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...Wolfgang Gentzsch
 
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...Red Hat Developers
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014soujavajug
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersSashko Stubailo
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsSjuul Janssen
 
Apigility-powered API's on IBM i
Apigility-powered API's on IBM iApigility-powered API's on IBM i
Apigility-powered API's on IBM ichukShirley
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformBob Killen
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingPiotr Perzyna
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 

Ähnlich wie Introduction to Hadoop Developer Training Webinar (20)

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
 
Build your operator with the right tool
Build your operator with the right toolBuild your operator with the right tool
Build your operator with the right tool
 
Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!
 
Meetup Devops-Geneva-19.10.2019
Meetup Devops-Geneva-19.10.2019Meetup Devops-Geneva-19.10.2019
Meetup Devops-Geneva-19.10.2019
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to Product
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
 
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...
Developer joy for distributed teams with CodeReady Workspaces | DevNation Tec...
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014
 
Kubexperience intro session
Kubexperience intro sessionKubexperience intro session
Kubexperience intro session
 
Learn by doing
Learn by doingLearn by doing
Learn by doing
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product Developers
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basics
 
Apigility-powered API's on IBM i
Apigility-powered API's on IBM iApigility-powered API's on IBM i
Apigility-powered API's on IBM i
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals Training
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Introduction to Hadoop Developer Training Webinar

  • 1. An Introduction to Cloudera’s Hadoop Developer Training Course Ian Wrigley Curriculum Manager 1
  • 2. Welcome to the Webinar!  All lines are muted  Q & A after the presentation  Ask questions at any time by typing them in the WebEx panel  A recording of this Webinar will be available on demand at cloudera.com 2
  • 3. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 3
  • 4. Cloudera’s Training is the Industry Standard Big Data Cloudera has trained professionals from employees from 55% 100% of the Fortune 100 of the top 20 global have attended live technology firms to Cloudera training use Hadoop Cloudera has trained over 15,000 students 4
  • 5. Cloudera Training: The Benefits 1 Broadest Range of Courses Cover all the key Hadoop components 5 Widest Geographic Coverage Most classes offered: 20 countries plus virtual classroom 2 Most Experienced Instructors Over 15,000 students trained since 2009 6 Most Relevant Platform & Community CDH deployed more than all other distributions combined 3 Leader in Certification Over 5,000 accredited Cloudera professionals 7 Depth of Training Material Hands-on labs and VMs support live instruction 4 State of the Art Curriculum Classes updated regularly as Hadoop evolves 8 Ongoing Learning Video tutorials and e-learning complement training 5
  • 6. The professionalism and expansive technical knowledge of our classroom instructor was incredible. The quality of the training was on par with a university. 6
  • 7. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 7
  • 8. Common Attendee Profiles  Software Developers/Engineers  Business analysts  IT managers  Hadoop system administrators 8
  • 9. Course Pre-Requisites  Programming experience  Knowledge of Java highly recommended  Understanding of common computer science principles is helpful  Prior knowledge of Hadoop is not required 9
  • 10. Who Should Not Attend?  If you have no programming experience, you’re likely to find the course very difficult  You might consider our Hive and Pig training course instead  If you will be focused solely on configuring and managing your cluster, our Administrator training course would probably be a better alternative 10
  • 11. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 11
  • 12. Developer Training: Overview  The course assumes no pre-existing knowledge of Hadoop  Starts by discussing the motivation for Hadoop  What problems exist that are difficult (or impossible) to solve with existing systems  Explains basic Hadoop concepts  The Hadoop Distributed File System (HDFS)  MapReduce  Introduces the Hadoop API (Application Programming Interface) 12
  • 13. Developer Training: Overview (cont’d)  Moves on to discuss more complex Hadoop concepts  Custom Partitioners  Custom Writables and WritableComparables  Custom InputFormats and OutputFormats  Investigates common MapReduce algorithms  Sorting, searching, indexing, joining data sets, etc.  Then covers the Hadoop ‘ecosystem’  Hive, Pig, Sqoop, Flume, Mahout, Oozie 13
  • 15. Hands-On Exercises  The course features many Hands-On Exercises  Analyzing log files  Unit-testing Hadoop code  Writing and implementing Combiners  Writing custom Partitioners  Using SequenceFiles and file compression  Creating an inverted index  Creating custom WritableComparables  Importing data with Sqoop  Writing Hive queries  …and more 15
  • 16. Certification  Our Developer course is good preparation for the Cloudera Certified Developer for Apache Hadoop (CCDH) exam  A voucher for one attempt at the exam is currently included in the course fee 16
  • 17. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 17
  • 18. Chapter Topics Basic Programming with the Writing a MapReduce Program Hadoop Core API  The MapReduce flow  Basic MapReduce API concepts  Writing MapReduce applications in Java – The driver – The Mapper – The Reducer  Writing Mappers and Reducers in other languages with the Streaming API  Speeding up Hadoop development by using Eclipse  Hands-On Exercise: Writing a MapReduce Program  Differences between the Old and New MapReduce APIs  Conclusion © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 18
  • 19. What Is The Old API?  When Hadoop 0.20 was released, a ‘New API’ was introduced –Designed to make the API easier to evolve in the future –Favors abstract classes over interfaces  Some developers still use the Old API –Until CDH4, the New API was not absolutely feature-complete  All the code examples in this course use the New API –Old API-based solutions for many of the Hands-On Exercises for this course are available in the sample_solutions_oldapi directory © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 19
  • 20. New API vs. Old API: Some Key Differences New API Old API import org.apache.hadoop.mapreduce.* import org.apache.hadoop.mapred.* Driver code: Driver code: Configuration conf = new Configuration(); JobConf conf = new JobConf(conf, Job job = new Job(conf); Driver.class); job.setJarByClass(Driver.class); conf.setSomeProperty(...); job.setSomeProperty(...); ... ... JobClient.runJob(conf); job.waitForCompletion(true); Mapper: Mapper: public class MyMapper extends Mapper { public class MyMapper extends MapReduceBase public void map(Keytype k, Valuetype v, implements Context c) Mapper { { ... public void map(Keytype k, Valuetype v, c.write(key, val); OutputCollector o, Reporter r) } { } ... o.collect(key, val); } } © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 20
  • 21. New API vs. Old API: Some Key Differences (cont’d) New API Old API Reducer: Reducer: public class MyReducer extends Reducer { public class MyReducer extends MapReduceBase public void reduce(Keytype k, implements Reducer Iterable<Valuetype> v, Context { c) { for(Valuetype v : eachval) { public void reduce(Keytype k, // process eachval Iterator<Valuetype> c.write(key, val); v, } OutputCollector o, Reporter } r) { } while(v.hasnext()) { // process v.next() o.collect(key, val); } } } setup(Context c) (See later) configure(JobConf job) cleanup(Context c) (See later) close() © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 21
  • 22. MRv1 vs MRv2, Old API vs New API  There is a lot of confusion about the New and Old APIs, and MapReduce version 1 and MapReduce version 2  The chart below should clarify what is available with each version of MapReduce Old API New API MapReduce v1 ✔ ✔ MapReduce v2 ✔ ✔  Summary: Code using either the Old API or the New API will run under MRv1 and MRv2 –You will have to recompile the code to move from MR1 to MR2, but you will not have to change the code itself © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 22
  • 23. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 23
  • 24. Chapter Topics Practical Development Tips Basic Programming with the and Techniques Hadoop Core API  Strategies for debugging MapReduce code  Testing MapReduce code locally using LocalJobRunner  Writing and viewing log files  Retrieving job information with Counters  Determining the optimal number of Reducers for a job  Reusing objects  Creating Map-only MapReduce jobs  Hands-On Exercise: Using Counters and a Map-Only Job  Conclusion © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 24
  • 25. How Many Reducers Do You Need?  An important consideration when creating your job is to determine the number of Reducers specified  Default is a single Reducer  With a single Reducer, one task receives all keys in sorted order –This is sometimes advantageous if the output must be in completely sorted order –Can cause significant problems if there is a large amount of intermediate data –Node on which the Reducer is running may not have enough disk space to hold all intermediate data –The Reducer will take a long time to run © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 25
  • 26. Jobs Which Require a Single Reducer  If a job needs to output a file where all keys are listed in sorted order, a single Reducer must be used  Alternatively, the TotalOrderPartitioner can be used –Uses an externally generated file which contains information about intermediate key distribution –Partitions data such that all keys which go to the first Reducer are smaller than any which go to the second, etc –In this way, multiple Reducers can be used –Concatenating the Reducers’ output files results in a totally ordered list © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 26
  • 27. Jobs Which Require a Fixed Number of Reducers  Some jobs will require a specific number of Reducers  Example: a job must output one file per day of the week –Key will be the weekday –Seven Reducers will be specified –A Partitioner will be written which sends one key to each Reducer © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 27
  • 28. Jobs With a Variable Number of Reducers  Many jobs can be run with a variable number of Reducers  Developer must decide how many to specify –Each Reducer should get a reasonable amount of intermediate data, but not too much –Chicken-and-egg problem  Typical way to determine how many Reducers to specify: –Test the job with a relatively small test data set –Extrapolate to calculate the amount of intermediate data expected from the ‘real’ input data –Use that to calculate the number of Reducers which should be specified © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 28
  • 29. Jobs With a Variable Number of Reducers (cont’d)  Note: you should take into account the number of Reduce slots likely to be available on the cluster –If your job requires one more Reduce slot than there are available, a second ‘wave’ of Reducers will run –Consisting just of that single Reducer –Potentially doubling the amount of time spent on the Reduce phase –In this case, increasing the number of Reducers further may cut down the time spent in the Reduce phase –Two or more waves will run, but the Reducers in each wave will have to process less data © Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior written consent. 29
  • 30. Topics  Why Cloudera Training?  Who Should Attend Developer Training?  Developer Course Contents  A Deeper Dive: The New API vs The Old API  A Deeper Dive: Determining the Optimal Number of Reducers  Conclusion 30
  • 31. Conclusion  Cloudera’s Developer training course is:  Technical  Hands-on  Interactive  Comprehensive  Attendees leave the course with the skillset required to write, test, and run Hadoop jobs  The course is a good preparation for the CCDH certification exam 31
  • 32. Questions?  For more information on Cloudera’s training courses, or to book a place on an upcoming course: http://university.cloudera.com  My e-mail address: ian@cloudera.com  Feel free to ask questions!  Hit the Q&A button, and type away 32

Hinweis der Redaktion

  1. This topic is discussed in further detail in TDG 3e on pages 27-30 (TDG 2e, 25-27).NOTE: The New API / Old API is completely unrelated to MRv1 (MapReduce in CDH3 and earlier) / MRv2 (next-generation MapReduce, also called YARN, which will be available along with MRv1 starting in CDH4). Instructors are advised to avoid confusion by not mentioning MRv2 during this section of class, and if asked about it, to simply say that it’s unrelated to the old/new API and defer further discussion until later.
  2. On this slide, you should point out the similarities as well as the differences between the two APIs. You should emphasize that they are both doing the same thing and that there are just a few differences in how they go about it.You can tell whether a class belongs to the “Old API” or the “New API” based on the package name. The old API contains “mapred” while the new API contains “mapreduce” instead. This is the most important thing to keep in mind, because some classes/interfaces have the same name in both APIs. Consequently, when you are writing your import statements (or generating them with the IDE), you will want to be cautious and use the one that corresponds whichever API you are using to write your code.The functions of the OutputCollector and Reporter object have been consolidated into a single Context object. For this reason, the new API is sometimes called the “Context Objects” API (TDG 3e, page 27 or TDG 2e, page 25).NOTE: The “Keytype” and “Valuetype” shown in the map method signature aren’t actual classes defined in Hadoop API. They are just placeholders for whatever type you use for key and value (e.g. IntWritable and Text). Also, the generics for the keys and values are not shown in the class definition for the sake of brevity, but they are used in the new API just as they are in the old API.
  3. An example of maintaining sorted order globally across all reducers was given earlier in the course when Partitioners were introduced.NOTE: worker nodes are configured to reserve a portion (typically 20% - 30%) of their available disk space for storing intermediate data. If too many Mappers are feeding into too few reducers, you can produce more data than the reducer(s) could store. That’s a problem.At any rate, having all your mappers feeding into a single reducer (or just a few reducers) isn’t spreading the work efficiently across the cluster.
  4. Use of the TotalOrderPartitioner is described in detail on pages 274-277 of TDG 3e (TDG 2e, 237-241). It is essentially based on sampling your keyspace so you can divide it up efficiently among several reducers, based on the global sort order of those keys.
  5. But beware that this can be a naïve approach. If processing sales data this way, business-to-business operations (like plumbing supply warehouses) would likely have little or no data for the weekend since they will likely be closed. Conversely, a retail store in a shopping mall will likely have far more data for a Saturday than a Tuesday.
  6. The upper bound on the number of reducers is based on your cluster (machines are configured to have a certain number of “reduce slots” based on the CPU, RAM and other performance characteristics of the machine). The general advice is to choose something a bit less than the max number of reduce slots to allow for speculative execution.
  7. One factor in determining the reducer count is the reduce capacity the developer has access to (or the number of &quot;reduce slots&quot; in either the cluster or the user&apos;s pool). One technique is to make the reducer count a multiple of this capacity. If the developer has access to N slots, but they pick N+1 reducers, the reduce phase will go into a second &quot;wave&quot; which will cause that one extra reducer to potentially double the execution time of the reduce phase. However, if the developer chooses 2N or 3N reducers, each wave takes less time, but there are more &quot;waves&quot;, so you don&apos;t see a big degradation in job performance if you need a second wave (or more waves) due to an extra reducer, a failed task, etc.Suggestion: draw a picture on the whiteboard that shows reducers running in waves, showing cluster slot count, reducer execution times, etc. to tie together the explanation of performance issues as they have been explained in the last few slides:1 reducer will run very slow on an entire data setSetting the number of reducers to the available slot count can maximize parallelism in one reducer wave. However, if you have a failure then you&apos;ll run the reduce phase of the job into a second wave, and that will double the execution time of the reduce phase of the job.Setting the number of reducers to a high number will mean many waves of shorter running reducers. This scales nicely because you don&apos;t have to be aware of the cluster size and you don&apos;t have the cost of a second wave, but it might be more inefficient for some jobs.