SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Hadoop and the Rise of Big Data

           February 21, 2013
             Donald Miner
            @donaldpminer
        Donald.Miner@emc.com
About Don
Hadoop
•   Distributed platform up to thousands of nodes
•   Data storage and application framework
•   Started at Yahoo!
•   Open source
•   Based on a few Google papers (2003, 2004)
•   Runs on commodity hardware


         I’M HERE TO TELL YOU WHY HADOOP IS AWESOME
Hadoop users
•   Yahoo!                 •   Riot Games
•   Facebook               •   ComScore
•   eBay                   •   Twitter
•   AOL                    •   LinkedIn


           Hadoop Companies
• Cloudera, Hortonworks, EMC/Greenplum, IBM
• Numerous startups
Buzzword glossary
•   Unstructured & Structured Data
•   NoSQL
•   Big Data (volume, velocity, variety)
•   Data Science
•   Cloud computing
Hadoop component overview
• Core components:
  – HDFS (Hadoop Distributed File System)
  – MapReduce (Data analysis framework)
• Ecosystem
  – HBase (key-value store)
  – Pig (high-level data analysis language)
  – Hive (SQL-like data analysis language)
  – ZooKeeper (stores metadata)
  – Other stuff
Use cases
• Text processing
    – Indexing, counting, processing
•   Large-scale reports
•   Data science
•   Mixing data sources (data lakes)
•   Ad targeting
•   Image/Video/Audio processing
•   Cybersecurity
HDFS
• Stores files in folders (that’s it)
    – Nobody cares what’s in your files
•   Chunks large files into blocks (~64MB-1GB)
•   Blocks are scattered all over the place
•   3 replicates of each block (better safe than sorry)
•   One NameNode (might be sorry)
    – Knows which computers blocks live on
    – Knows which blocks belong to which files
• One DataNode per computer (slaves!)
    – Hosts files
HDFS Demonstration
MapReduce
•   Analyzes data in HDFS where the data is
•   Jobs are split into Mappers and Reducers
•   JobTracker – keeps track of running jobs
•   TaskTracker – one per computer, executes tasks
•   Mappers (you code this)
    – Loads data from HDFS
    – Filter, transform, parse
    – Outputs (key, value) pairs
• Reducers (you code this, too)
    – Groups by the mapper’s output key
    – Aggregate, count, statistics
    – Outputs to HDFS
MapReduce Demonstration
Hadoop ecosystem
• HDFS and MapReduce don’t do everything
• Pig – high-level language
        grpd = GROUP logs BY userAgent;
        counts = FOREACH grpd GENERATE group,
          AVG(logs.timeMicroSec)/1.0E+06 AS loadTimeSec;
        byCount = ORDER counts BY loadTimeSec DESC;
        top = limit byCount 15;

• Hive – high-level SQL language
      SELECT grp, SUM(col2), COUNT(*) FROM table1 GROUP BY grp;

• HBase – key/value store
Cool thing #1: Linear Scalability
• HDFS and MapReduce scale linearly
• If you have twice as many computers, things run
  twice as fast
• If you have twice as much data, things run twice
  as slow
• If you have twice as many computers, you can
  store twice as much data
• This stays true (some minor caveats)
• DATA LOCALITY!!
Cool thing #2: Schema on Read
       Before:
       ETL, schema design, tossing out original data



               NOW:
LOAD DATA                  ????        PROFIT!!
 Data is parsed/interpreted as it is loaded out of HDFS
               What implications does this have?
                      Keep original data around!
                      Have multiple views of the same data!
                      Store first, figure out what to do with it later!
Cool thing #3: Transparent Parallelism
                                                                    RPC?
  Code deployment?
                         Network programming?

Data center fires?                                          Distributed stuff?

      Inter-process communication?
                                                Fault tolerance?        Message passing?

Threading?
                                   Locking?

      With MapReduce, I DON’T CARE
                                     … I just have to fit my solution into this tiny box


              Solution               MapReduce
Cool thing #4: Cheap
• Commodity hardware (meh)
• Open source (people cost more though)
• Add more hardware later
How to get started
• Install Hadoop in a Linux VM
  – Wait how is this helpful?? Hadoop is distributed!


• Use Google (seriously)

• Some prerequisites: Java, Linux, Data, Time
Stuff Hadoop is good at
•   Batch processing
•   Processing lots of data
•   Outputting lots of data
•   Storing lots of historical data
•   Flexible analysis of data
•   Dealing with unstructured or structured data
Stuff Hadoop is not good at
• Hadoop is a freight truck, not a sports car
• Updating data (think “append-only”)
• Being easy to use
  – Java
  – Administration
• Hadoop is not good storage (don’t throw away
  your EMC stuff!)
QUESTIONS?
Hadoop and the Rise of Big Data

           February 21, 2013
             Donald Miner
            @donaldpminer
        Donald.Miner@emc.com

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at myliferesponseteam
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFSBrendan Tierney
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQueryCsaba Toth
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the roomcacois
 

Was ist angesagt? (20)

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 

Andere mochten auch

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassMindgrub Technologies
 
Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Mindgrub Technologies
 
Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Mindgrub Technologies
 
KSDE video games in the classroom
KSDE video games in the classroomKSDE video games in the classroom
KSDE video games in the classroomDoug Adams
 

Andere mochten auch (8)

Mobile Games & Culture
Mobile Games & CultureMobile Games & Culture
Mobile Games & Culture
 
Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google Glass
 
Video Games “in” the Classroom
Video Games “in” the ClassroomVideo Games “in” the Classroom
Video Games “in” the Classroom
 
Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1
 
Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
 
The iPad Classroom
The iPad ClassroomThe iPad Classroom
The iPad Classroom
 
KSDE video games in the classroom
KSDE video games in the classroomKSDE video games in the classroom
KSDE video games in the classroom
 

Ähnlich wie BW Tech Meetup: Hadoop and The rise of Big Data

Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 

Ähnlich wie BW Tech Meetup: Hadoop and The rise of Big Data (20)

Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Anju
AnjuAnju
Anju
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 

Mehr von Mindgrub Technologies

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassMindgrub Technologies
 
Mobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMindgrub Technologies
 
Submission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsSubmission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsMindgrub Technologies
 
How to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationHow to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationMindgrub Technologies
 
ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)Mindgrub Technologies
 
Top 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceTop 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceMindgrub Technologies
 
Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Mindgrub Technologies
 
TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011Mindgrub Technologies
 
The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011Mindgrub Technologies
 
Flash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidFlash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidMindgrub Technologies
 

Mehr von Mindgrub Technologies (20)

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google Glass
 
Mobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMobile Gaming: Past Present and Future
Mobile Gaming: Past Present and Future
 
Special Topics in Mobile
Special Topics in MobileSpecial Topics in Mobile
Special Topics in Mobile
 
Mobile Marketing 101
Mobile Marketing 101Mobile Marketing 101
Mobile Marketing 101
 
Mobile Web vs. Native Apps
Mobile Web vs. Native AppsMobile Web vs. Native Apps
Mobile Web vs. Native Apps
 
Submission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsSubmission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web Apps
 
Software Development Lifecycle 101
Software Development Lifecycle 101Software Development Lifecycle 101
Software Development Lifecycle 101
 
How to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationHow to Leverage Mobile For Your Organization
How to Leverage Mobile For Your Organization
 
ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)
 
Top 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceTop 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile Workforce
 
The Future of eLearning
The Future of eLearningThe Future of eLearning
The Future of eLearning
 
Mobile, Social & Web Marketing
Mobile, Social & Web MarketingMobile, Social & Web Marketing
Mobile, Social & Web Marketing
 
SXSW Interactive 2012: A Recap
SXSW Interactive 2012: A RecapSXSW Interactive 2012: A Recap
SXSW Interactive 2012: A Recap
 
Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)
 
TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011
 
The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011
 
Adobe Flash and Device Central
Adobe Flash and Device CentralAdobe Flash and Device Central
Adobe Flash and Device Central
 
Flash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidFlash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and Android
 
Using Google Wave and Buzz
Using Google Wave and BuzzUsing Google Wave and Buzz
Using Google Wave and Buzz
 
Creating a Facebook App
Creating a Facebook AppCreating a Facebook App
Creating a Facebook App
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

BW Tech Meetup: Hadoop and The rise of Big Data

  • 1. Hadoop and the Rise of Big Data February 21, 2013 Donald Miner @donaldpminer Donald.Miner@emc.com
  • 3. Hadoop • Distributed platform up to thousands of nodes • Data storage and application framework • Started at Yahoo! • Open source • Based on a few Google papers (2003, 2004) • Runs on commodity hardware I’M HERE TO TELL YOU WHY HADOOP IS AWESOME
  • 4. Hadoop users • Yahoo! • Riot Games • Facebook • ComScore • eBay • Twitter • AOL • LinkedIn Hadoop Companies • Cloudera, Hortonworks, EMC/Greenplum, IBM • Numerous startups
  • 5. Buzzword glossary • Unstructured & Structured Data • NoSQL • Big Data (volume, velocity, variety) • Data Science • Cloud computing
  • 6. Hadoop component overview • Core components: – HDFS (Hadoop Distributed File System) – MapReduce (Data analysis framework) • Ecosystem – HBase (key-value store) – Pig (high-level data analysis language) – Hive (SQL-like data analysis language) – ZooKeeper (stores metadata) – Other stuff
  • 7. Use cases • Text processing – Indexing, counting, processing • Large-scale reports • Data science • Mixing data sources (data lakes) • Ad targeting • Image/Video/Audio processing • Cybersecurity
  • 8. HDFS • Stores files in folders (that’s it) – Nobody cares what’s in your files • Chunks large files into blocks (~64MB-1GB) • Blocks are scattered all over the place • 3 replicates of each block (better safe than sorry) • One NameNode (might be sorry) – Knows which computers blocks live on – Knows which blocks belong to which files • One DataNode per computer (slaves!) – Hosts files
  • 10. MapReduce • Analyzes data in HDFS where the data is • Jobs are split into Mappers and Reducers • JobTracker – keeps track of running jobs • TaskTracker – one per computer, executes tasks • Mappers (you code this) – Loads data from HDFS – Filter, transform, parse – Outputs (key, value) pairs • Reducers (you code this, too) – Groups by the mapper’s output key – Aggregate, count, statistics – Outputs to HDFS
  • 12. Hadoop ecosystem • HDFS and MapReduce don’t do everything • Pig – high-level language grpd = GROUP logs BY userAgent; counts = FOREACH grpd GENERATE group, AVG(logs.timeMicroSec)/1.0E+06 AS loadTimeSec; byCount = ORDER counts BY loadTimeSec DESC; top = limit byCount 15; • Hive – high-level SQL language SELECT grp, SUM(col2), COUNT(*) FROM table1 GROUP BY grp; • HBase – key/value store
  • 13. Cool thing #1: Linear Scalability • HDFS and MapReduce scale linearly • If you have twice as many computers, things run twice as fast • If you have twice as much data, things run twice as slow • If you have twice as many computers, you can store twice as much data • This stays true (some minor caveats) • DATA LOCALITY!!
  • 14. Cool thing #2: Schema on Read Before: ETL, schema design, tossing out original data NOW: LOAD DATA  ????  PROFIT!! Data is parsed/interpreted as it is loaded out of HDFS What implications does this have? Keep original data around! Have multiple views of the same data! Store first, figure out what to do with it later!
  • 15. Cool thing #3: Transparent Parallelism RPC? Code deployment? Network programming? Data center fires? Distributed stuff? Inter-process communication? Fault tolerance? Message passing? Threading? Locking? With MapReduce, I DON’T CARE … I just have to fit my solution into this tiny box Solution MapReduce
  • 16. Cool thing #4: Cheap • Commodity hardware (meh) • Open source (people cost more though) • Add more hardware later
  • 17. How to get started • Install Hadoop in a Linux VM – Wait how is this helpful?? Hadoop is distributed! • Use Google (seriously) • Some prerequisites: Java, Linux, Data, Time
  • 18. Stuff Hadoop is good at • Batch processing • Processing lots of data • Outputting lots of data • Storing lots of historical data • Flexible analysis of data • Dealing with unstructured or structured data
  • 19. Stuff Hadoop is not good at • Hadoop is a freight truck, not a sports car • Updating data (think “append-only”) • Being easy to use – Java – Administration • Hadoop is not good storage (don’t throw away your EMC stuff!)
  • 20. QUESTIONS? Hadoop and the Rise of Big Data February 21, 2013 Donald Miner @donaldpminer Donald.Miner@emc.com