SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Data Processing with Hadoop
Looking Back, Looking Ahead

Arun C. Murthy
Founder & Architect
@acmurthy (@hortonworks)




                              Page 1
Hello!
• Founder/Architect at Hortonworks Inc.
  – Lead - Map-Reduce/YARN/Tez
  – Formerly, Architect Hadoop MapReduce, Yahoo
  – Responsible for running Hadoop MapReduce as a service for all
    of Yahoo (~50k nodes footprint)


• Apache Hadoop, ASF
  – Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
  – Long-term Committer/PMC member (full time for 7 years)
  – Release Manager for hadoop-2.x




                          © Hortonworks Inc. 2013               Page 2
Once upon a time …




  … long, long ago, there was a kingdom we shall call
                   Apache Hadoop
                                               http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo



                     © Hortonworks Inc. 2013                                                                      Page 3
Hadoop begat …




 … a two-headed monster on every node in the kingdom;
  each belonged to a different clan and answered to a
                   different master
                   http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg


                        © Hortonworks Inc. 2013                                                                                             Page 4
Knights of Bytes - HDFS




… stored data uncompromisingly in directories/files, nary a
                  care about contents
                                                 http://whoiscraigmoser.com/Images/identity/knight.png



                       © Hortonworks Inc. 2013                                  Page 5
Prince of Processing - MapReduce




      He ruled with an iron fist by mapping,
      and then by mercilessly reducing data  http://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg



                   © Hortonworks Inc. 2013                                            Page 6
Peace Reigned

… for a while with the odd change in the direction of the wind




                                                    http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg


                          © Hortonworks Inc. 2013                                             Page 7
Slowly, but surely …




Human beings define reality through misery and suffering.
                                            - Agent Smith
              http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp


                                         © Hortonworks Inc. 2013                                                                                          Page 8
Slowly, but surely …




Human beings define reality through misery and suffering.
                                            - Agent Smith
              http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp


                                         © Hortonworks Inc. 2013                                                                                          Page 9
Slowly, but surely …




    … people of the kingdom clamored for more.
     A palpable sense of greed & expectation.
                                              http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg


                    © Hortonworks Inc. 2013                                         Page 10
Signs of Distress




          SQL said some, others said Machine Learning,
            still others said Real-Time Event Processing
                                            http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg


                  © Hortonworks Inc. 2013                                             Page 11
A Meeting at the Summit




MapReduce is dead!
                       Err… not quite.
We need more options! We need more!
                               True…
                     http://4.bp.blogspot.com/-
                     oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp

                            © Hortonworks Inc. 2013                                                                                  Page 12
A Meeting at the Summit


A common thread YARN running through all applications…
                                    Long live the King!




                                               http://whipup.net/wp-content/images/2008/08/yarn.gif


                     © Hortonworks Inc. 2013                              Page 13
The Edict




     Henceforth, in the Kingdom of King YARN…
     MapReduce has been relegated to the status
         of, merely, one of the applications!
                                            http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg


                  © Hortonworks Inc. 2013                                                 Page 14
Reign of King YARN

King YARN came to throne
with promises to return power
to all applications
equally, lower performance
taxes and resource
management…




                                                 http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg


                       © Hortonworks Inc. 2013                                                                               Page 15
Oh the Shame!

Well, at least, Prince
MapReduce still had
powerful allies like
Highness
Hive, Powerful
Pig, Cheery
Cascading…

                                          http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg


                © Hortonworks Inc. 2013                                   Page 16
Things get worse before better




Unfortunately, things got a lot worse for the Prince MapReduce…
                                                   http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg


                         © Hortonworks Inc. 2013                                                                       Page 17
Knight Tez




      He did MapReduce, and so much more…
      Smartly aligned himself to Kingdom YARN.
                                             http://twomorrows.com/alterego/media/08shiningknight.gif


                   © Hortonworks Inc. 2013                                  Page 18
Knight Tez


Long term alliances of MapReduce with
Hive, Pig, Cascading etc. broke up…




                                          … they decided to throw their
                                          lot with Knight Tez!
                                                                        http://informatica.upg-ploiesti.ro/62689/img/partners.jpg
                                                        http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png


                        © Hortonworks Inc. 2013                                                       Page 19
Happily ever after…




         (nothing cute to say)




               © Hortonworks Inc. 2013   Page 20
On a more serious note…




           © Hortonworks Inc. 2013   Page 21
Every season has a flavor…



   SQL-on-Hadoop is the new black!

  SQL-on-Hadoop will be solved within
       the existing ecosystem



              © Hortonworks Inc. 2013   Page 22
Looking ahead



       What will it be next year?

     Real-time event processing?
         Machine Learning?



                © Hortonworks Inc. 2013   Page 23
Play to our strengths




 Invest in the Apache Hadoop platform
    and the ecosystem (Hive et al).




                © Hortonworks Inc. 2013   Page 24
Seriously…
Technical Details




                    © Hortonworks Inc. 2013   Page 25
Hadoop MapReduce – The System




             © Hortonworks Inc. 2013   Page 26
Hadoop MapReduce – The Paradigm



      m              m0                m1   m2   m3   m4




      r                                r0   r1   r2




             © Hortonworks Inc. 2013                       Page 27
Hadoop YARN

                                           Node
                                           Node
                                          Manager
                                          Manager


                                    Container   App Mstr
                                                App Mstr


     Client

                         Resource          Node
                                           Node
                         Resource
                         Manager
                         Manager          Manager
                                          Manager
     Client
      Client

                                    App Mstr    Container
                                                Container




      MapReduce Status                     Node
                                           Node
      MapReduce Status
                                          Manager
                                          Manager
        Job Submission
       Job Submission
         Node Status
        Node Status
      Resource Request
      Resource Request              Container   Container
Tez - Core Ideas

Task <Input, Processor & Output>



            Input   Processor   Output




                       Task




    Tez Task - <Input, Processor, Output>




                     YARN ApplicationMaster to run DAG of Tasks

                                            © Hortonworks Inc. 2013   Page 29
Pig/Hive-MR versus Pig/Hive-Tez

                                 SELECT a.state, COUNT(*)
                               FROM a JOIN b ON (a.id = b.id)
                                    GROUP BY a.state




 I/O Synchronization
                                                                             I/O Pipelining
       Barrier




               Pig/Hive - MR                                    Pig/Hive - Tez



                                 © Hortonworks Inc. 2013                                      Page 30
Pig/Hive-MR versus Pig/Hive-Tez
                                           SELECT a.state, COUNT(*), AVERAGE(c.price)
                                                            FROM a
                                                     JOIN b ON (a.id = b.id)
                                                 JOIN c ON (a.itemId = c.itemId)
                                                       GROUP BY a.state


                                   Job 1



                                                         Job 2

I/O Synchronization
      Barrier




         Job 3




             I/O Synchronization
                   Barrier



                                                                                           Single Job


                                                 Job 4




                         Pig/Hive - MR                                                                  Pig/Hive - Tez
                                                                 © Hortonworks Inc. 2013                                 Page 31
Thank You!
 Questions (surely) & Answers (maybe)




@acmurthy


                    © Hortonworks Inc. 2013   Page 32

Weitere ähnliche Inhalte

Ähnlich wie Past Present and Future of Data Processing in Apache Hadoop

Surfing the Data Flow
Surfing the Data FlowSurfing the Data Flow
Surfing the Data FlowPaul Miller
 
OpenStack Ironic Project Summary, February 2014
OpenStack Ironic Project Summary, February 2014OpenStack Ironic Project Summary, February 2014
OpenStack Ironic Project Summary, February 2014Devananda Van Der Veen
 
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisProcess and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisHortonworks
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Reliability & Scale in AWS while letting you sleep through the night
Reliability & Scale in AWS while letting you sleep through the night Reliability & Scale in AWS while letting you sleep through the night
Reliability & Scale in AWS while letting you sleep through the night Jos Boumans
 
Python + NoSQL in Animations
Python + NoSQL in AnimationsPython + NoSQL in Animations
Python + NoSQL in AnimationsShuen-Huei Guan
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...EMC
 
Openwebdylanqconbeijing 090423091545-phpapp01
Openwebdylanqconbeijing 090423091545-phpapp01Openwebdylanqconbeijing 090423091545-phpapp01
Openwebdylanqconbeijing 090423091545-phpapp01youzitang
 
Sitepen Getting There From Here
Sitepen   Getting There From HereSitepen   Getting There From Here
Sitepen Getting There From HereGeorge Ang
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pigdaijy
 

Ähnlich wie Past Present and Future of Data Processing in Apache Hadoop (10)

Surfing the Data Flow
Surfing the Data FlowSurfing the Data Flow
Surfing the Data Flow
 
OpenStack Ironic Project Summary, February 2014
OpenStack Ironic Project Summary, February 2014OpenStack Ironic Project Summary, February 2014
OpenStack Ironic Project Summary, February 2014
 
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisProcess and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Reliability & Scale in AWS while letting you sleep through the night
Reliability & Scale in AWS while letting you sleep through the night Reliability & Scale in AWS while letting you sleep through the night
Reliability & Scale in AWS while letting you sleep through the night
 
Python + NoSQL in Animations
Python + NoSQL in AnimationsPython + NoSQL in Animations
Python + NoSQL in Animations
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
Openwebdylanqconbeijing 090423091545-phpapp01
Openwebdylanqconbeijing 090423091545-phpapp01Openwebdylanqconbeijing 090423091545-phpapp01
Openwebdylanqconbeijing 090423091545-phpapp01
 
Sitepen Getting There From Here
Sitepen   Getting There From HereSitepen   Getting There From Here
Sitepen Getting There From Here
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pig
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Past Present and Future of Data Processing in Apache Hadoop

  • 1. Data Processing with Hadoop Looking Back, Looking Ahead Arun C. Murthy Founder & Architect @acmurthy (@hortonworks) Page 1
  • 2. Hello! • Founder/Architect at Hortonworks Inc. – Lead - Map-Reduce/YARN/Tez – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MapReduce as a service for all of Yahoo (~50k nodes footprint) • Apache Hadoop, ASF – Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time for 7 years) – Release Manager for hadoop-2.x © Hortonworks Inc. 2013 Page 2
  • 3. Once upon a time … … long, long ago, there was a kingdom we shall call Apache Hadoop http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo © Hortonworks Inc. 2013 Page 3
  • 4. Hadoop begat … … a two-headed monster on every node in the kingdom; each belonged to a different clan and answered to a different master http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg © Hortonworks Inc. 2013 Page 4
  • 5. Knights of Bytes - HDFS … stored data uncompromisingly in directories/files, nary a care about contents http://whoiscraigmoser.com/Images/identity/knight.png © Hortonworks Inc. 2013 Page 5
  • 6. Prince of Processing - MapReduce He ruled with an iron fist by mapping, and then by mercilessly reducing data http://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg © Hortonworks Inc. 2013 Page 6
  • 7. Peace Reigned … for a while with the odd change in the direction of the wind http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg © Hortonworks Inc. 2013 Page 7
  • 8. Slowly, but surely … Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 8
  • 9. Slowly, but surely … Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 9
  • 10. Slowly, but surely … … people of the kingdom clamored for more. A palpable sense of greed & expectation. http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg © Hortonworks Inc. 2013 Page 10
  • 11. Signs of Distress SQL said some, others said Machine Learning, still others said Real-Time Event Processing http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg © Hortonworks Inc. 2013 Page 11
  • 12. A Meeting at the Summit MapReduce is dead! Err… not quite. We need more options! We need more! True… http://4.bp.blogspot.com/- oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp © Hortonworks Inc. 2013 Page 12
  • 13. A Meeting at the Summit A common thread YARN running through all applications… Long live the King! http://whipup.net/wp-content/images/2008/08/yarn.gif © Hortonworks Inc. 2013 Page 13
  • 14. The Edict Henceforth, in the Kingdom of King YARN… MapReduce has been relegated to the status of, merely, one of the applications! http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg © Hortonworks Inc. 2013 Page 14
  • 15. Reign of King YARN King YARN came to throne with promises to return power to all applications equally, lower performance taxes and resource management… http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg © Hortonworks Inc. 2013 Page 15
  • 16. Oh the Shame! Well, at least, Prince MapReduce still had powerful allies like Highness Hive, Powerful Pig, Cheery Cascading… http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg © Hortonworks Inc. 2013 Page 16
  • 17. Things get worse before better Unfortunately, things got a lot worse for the Prince MapReduce… http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg © Hortonworks Inc. 2013 Page 17
  • 18. Knight Tez He did MapReduce, and so much more… Smartly aligned himself to Kingdom YARN. http://twomorrows.com/alterego/media/08shiningknight.gif © Hortonworks Inc. 2013 Page 18
  • 19. Knight Tez Long term alliances of MapReduce with Hive, Pig, Cascading etc. broke up… … they decided to throw their lot with Knight Tez! http://informatica.upg-ploiesti.ro/62689/img/partners.jpg http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png © Hortonworks Inc. 2013 Page 19
  • 20. Happily ever after… (nothing cute to say) © Hortonworks Inc. 2013 Page 20
  • 21. On a more serious note… © Hortonworks Inc. 2013 Page 21
  • 22. Every season has a flavor… SQL-on-Hadoop is the new black! SQL-on-Hadoop will be solved within the existing ecosystem © Hortonworks Inc. 2013 Page 22
  • 23. Looking ahead What will it be next year? Real-time event processing? Machine Learning? © Hortonworks Inc. 2013 Page 23
  • 24. Play to our strengths Invest in the Apache Hadoop platform and the ecosystem (Hive et al). © Hortonworks Inc. 2013 Page 24
  • 25. Seriously… Technical Details © Hortonworks Inc. 2013 Page 25
  • 26. Hadoop MapReduce – The System © Hortonworks Inc. 2013 Page 26
  • 27. Hadoop MapReduce – The Paradigm m m0 m1 m2 m3 m4 r r0 r1 r2 © Hortonworks Inc. 2013 Page 27
  • 28. Hadoop YARN Node Node Manager Manager Container App Mstr App Mstr Client Resource Node Node Resource Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
  • 29. Tez - Core Ideas Task <Input, Processor & Output> Input Processor Output Task Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tasks © Hortonworks Inc. 2013 Page 29
  • 30. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*) FROM a JOIN b ON (a.id = b.id) GROUP BY a.state I/O Synchronization I/O Pipelining Barrier Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 30
  • 31. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier Job 3 I/O Synchronization Barrier Single Job Job 4 Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 31
  • 32. Thank You! Questions (surely) & Answers (maybe) @acmurthy © Hortonworks Inc. 2013 Page 32