SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
SARA Hadoop Hackathon
   Evert.Lammerts@sara.nl
   December 7, 2010
DJOERD HIEMSTRA
                                             (UTwente)




EDGAR MEIJ
     (UvA)



             SARA Hadoop Hackathon, December 7, 2010
2002             2004                   2006

Nutch*           MR/GFS**               Hadoop


*  http://nutch.apache.org/
** http://labs.google.com/papers/mapreduce.html
   http://labs.google.com/papers/gfs.html

                     SARA Hadoop Hackathon, December 7, 2010
2010: A Hype in Production




http://wiki.apache.org/hadoop/PoweredBy
                SARA Hadoop Hackathon, December 7, 2010
Super computing




Cloud computing                               Grid computing




     Cluster computing              GPU computing


                    http://www.sara.nl/

           SARA Hadoop Hackathon, December 7, 2010
:-(
                       Data         Expensive!
                                                         Computation




                                         :-)
                       Data          Cheaper!
                                                         Computation




Ref: Luiz André Barroso and Urs Hölzle, Google Inc.
   The Datacenter as a Computer: An Introduction to the Design of Warehouse­Scale Machines



                             SARA Hadoop Hackathon, December 7, 2010
NameNode              JobTracker




DN   TT   DN      TT             DN        TT        DN     TT


DN   TT   DN      TT             DN        TT        DN     TT



                                                    DN    DataNode

                                                    TT   TaskTracker

          SARA Hadoop Hackathon, December 7, 2010
File   Map                              Shuffle         Reduce           Output
       $ echo “${email#*@}, ${name}”     $ sort          $ wc ­l




                                                                      ewi.utwente.nl, 1
                                                                      gmail.com,      2
                                                                      nbic.nl,        1
                                                                      nikhef.nl,      3
                                                                      sara.nl,        1




                            SARA Hadoop Hackathon, December 7, 2010
From: Hadoop, The Definitive Guide (2nd Edition), Tom White




           SARA Hadoop Hackathon, December 7, 2010
Today

09.30 - 09.50   Welcome & Introduction
09.50 - 10.15   Map/Reduce @ University of Twente
10.15 - 10.30   Kick-off hackathon
14.00 - 15.00   Optional: SARA tour
10.30 - 17.00   Hackathon
17.00 - 17.30   Results and closing




                SARA Hadoop Hackathon, December 7, 2010

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical Basics
 
Geek camp
Geek campGeek camp
Geek camp
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 

Ähnlich wie Introduction to SARA's Hadoop Hackathon - dec 7th 2010

Seattle hug 2010
Seattle hug 2010Seattle hug 2010
Seattle hug 2010
Abe Taha
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
guest27e6764
 
Distributed Social Networking
Distributed Social NetworkingDistributed Social Networking
Distributed Social Networking
Bastian Hofmann
 

Ähnlich wie Introduction to SARA's Hadoop Hackathon - dec 7th 2010 (20)

20100128ebay
20100128ebay20100128ebay
20100128ebay
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Seattle hug 2010
Seattle hug 2010Seattle hug 2010
Seattle hug 2010
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
 
Riak Intro
Riak IntroRiak Intro
Riak Intro
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
TheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the RescueTheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the Rescue
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Distributed Social Networking
Distributed Social NetworkingDistributed Social Networking
Distributed Social Networking
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Introduction to SARA's Hadoop Hackathon - dec 7th 2010

  • 1. SARA Hadoop Hackathon Evert.Lammerts@sara.nl December 7, 2010
  • 2. DJOERD HIEMSTRA (UTwente) EDGAR MEIJ (UvA) SARA Hadoop Hackathon, December 7, 2010
  • 3. 2002 2004 2006 Nutch* MR/GFS** Hadoop *  http://nutch.apache.org/ ** http://labs.google.com/papers/mapreduce.html    http://labs.google.com/papers/gfs.html SARA Hadoop Hackathon, December 7, 2010
  • 4. 2010: A Hype in Production http://wiki.apache.org/hadoop/PoweredBy SARA Hadoop Hackathon, December 7, 2010
  • 5. Super computing Cloud computing Grid computing Cluster computing GPU computing http://www.sara.nl/ SARA Hadoop Hackathon, December 7, 2010
  • 6. :-( Data Expensive! Computation :-) Data Cheaper! Computation Ref: Luiz André Barroso and Urs Hölzle, Google Inc.    The Datacenter as a Computer: An Introduction to the Design of Warehouse­Scale Machines SARA Hadoop Hackathon, December 7, 2010
  • 7. NameNode JobTracker DN TT DN TT DN TT DN TT DN TT DN TT DN TT DN TT DN DataNode TT TaskTracker SARA Hadoop Hackathon, December 7, 2010
  • 8. File Map Shuffle Reduce Output $ echo “${email#*@}, ${name}” $ sort $ wc ­l ewi.utwente.nl, 1 gmail.com,      2 nbic.nl,        1 nikhef.nl,      3 sara.nl,        1 SARA Hadoop Hackathon, December 7, 2010
  • 9. From: Hadoop, The Definitive Guide (2nd Edition), Tom White SARA Hadoop Hackathon, December 7, 2010
  • 10. Today 09.30 - 09.50 Welcome & Introduction 09.50 - 10.15 Map/Reduce @ University of Twente 10.15 - 10.30 Kick-off hackathon 14.00 - 15.00 Optional: SARA tour 10.30 - 17.00 Hackathon 17.00 - 17.30 Results and closing SARA Hadoop Hackathon, December 7, 2010