SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Hadoop and the Rise of Big Data

           February 21, 2013
             Donald Miner
            @donaldpminer
        Donald.Miner@emc.com
About Don
Hadoop
•   Distributed platform up to thousands of nodes
•   Data storage and application framework
•   Started at Yahoo!
•   Open source
•   Based on a few Google papers (2003, 2004)
•   Runs on commodity hardware


         I’M HERE TO TELL YOU WHY HADOOP IS AWESOME
Hadoop users
•   Yahoo!                 •   Riot Games
•   Facebook               •   ComScore
•   eBay                   •   Twitter
•   AOL                    •   LinkedIn


           Hadoop Companies
• Cloudera, Hortonworks, EMC/Greenplum, IBM
• Numerous startups
Buzzword glossary
•   Unstructured & Structured Data
•   NoSQL
•   Big Data (volume, velocity, variety)
•   Data Science
•   Cloud computing
Hadoop component overview
• Core components:
  – HDFS (Hadoop Distributed File System)
  – MapReduce (Data analysis framework)
• Ecosystem
  – HBase (key-value store)
  – Pig (high-level data analysis language)
  – Hive (SQL-like data analysis language)
  – ZooKeeper (stores metadata)
  – Other stuff
Use cases
• Text processing
    – Indexing, counting, processing
•   Large-scale reports
•   Data science
•   Mixing data sources (data lakes)
•   Ad targeting
•   Image/Video/Audio processing
•   Cybersecurity
HDFS
• Stores files in folders (that’s it)
    – Nobody cares what’s in your files
•   Chunks large files into blocks (~64MB-1GB)
•   Blocks are scattered all over the place
•   3 replicates of each block (better safe than sorry)
•   One NameNode (might be sorry)
    – Knows which computers blocks live on
    – Knows which blocks belong to which files
• One DataNode per computer (slaves!)
    – Hosts files
HDFS Demonstration
MapReduce
•   Analyzes data in HDFS where the data is
•   Jobs are split into Mappers and Reducers
•   JobTracker – keeps track of running jobs
•   TaskTracker – one per computer, executes tasks
•   Mappers (you code this)
    – Loads data from HDFS
    – Filter, transform, parse
    – Outputs (key, value) pairs
• Reducers (you code this, too)
    – Groups by the mapper’s output key
    – Aggregate, count, statistics
    – Outputs to HDFS
MapReduce Demonstration
Hadoop ecosystem
• HDFS and MapReduce don’t do everything
• Pig – high-level language
        grpd = GROUP logs BY userAgent;
        counts = FOREACH grpd GENERATE group,
          AVG(logs.timeMicroSec)/1.0E+06 AS loadTimeSec;
        byCount = ORDER counts BY loadTimeSec DESC;
        top = limit byCount 15;

• Hive – high-level SQL language
      SELECT grp, SUM(col2), COUNT(*) FROM table1 GROUP BY grp;

• HBase – key/value store
Cool thing #1: Linear Scalability
• HDFS and MapReduce scale linearly
• If you have twice as many computers, things run
  twice as fast
• If you have twice as much data, things run twice
  as slow
• If you have twice as many computers, you can
  store twice as much data
• This stays true (some minor caveats)
• DATA LOCALITY!!
Cool thing #2: Schema on Read
       Before:
       ETL, schema design, tossing out original data



               NOW:
LOAD DATA                  ????        PROFIT!!
 Data is parsed/interpreted as it is loaded out of HDFS
               What implications does this have?
                      Keep original data around!
                      Have multiple views of the same data!
                      Store first, figure out what to do with it later!
Cool thing #3: Transparent Parallelism
                                                                    RPC?
  Code deployment?
                         Network programming?

Data center fires?                                          Distributed stuff?

      Inter-process communication?
                                                Fault tolerance?        Message passing?

Threading?
                                   Locking?

      With MapReduce, I DON’T CARE
                                     … I just have to fit my solution into this tiny box


              Solution               MapReduce
Cool thing #4: Cheap
• Commodity hardware (meh)
• Open source (people cost more though)
• Add more hardware later
How to get started
• Install Hadoop in a Linux VM
  – Wait how is this helpful?? Hadoop is distributed!


• Use Google (seriously)

• Some prerequisites: Java, Linux, Data, Time
Stuff Hadoop is good at
•   Batch processing
•   Processing lots of data
•   Outputting lots of data
•   Storing lots of historical data
•   Flexible analysis of data
•   Dealing with unstructured or structured data
Stuff Hadoop is not good at
• Hadoop is a freight truck, not a sports car
• Updating data (think “append-only”)
• Being easy to use
  – Java
  – Administration
• Hadoop is not good storage (don’t throw away
  your EMC stuff!)
QUESTIONS?
Hadoop and the Rise of Big Data

           February 21, 2013
             Donald Miner
            @donaldpminer
        Donald.Miner@emc.com

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at myliferesponseteam
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFSBrendan Tierney
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQueryCsaba Toth
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the roomcacois
 

Was ist angesagt? (20)

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 

Andere mochten auch

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassMindgrub Technologies
 
Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Mindgrub Technologies
 
Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Mindgrub Technologies
 
KSDE video games in the classroom
KSDE video games in the classroomKSDE video games in the classroom
KSDE video games in the classroomDoug Adams
 

Andere mochten auch (8)

Mobile Games & Culture
Mobile Games & CultureMobile Games & Culture
Mobile Games & Culture
 
Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google Glass
 
Video Games “in” the Classroom
Video Games “in” the ClassroomVideo Games “in” the Classroom
Video Games “in” the Classroom
 
Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1Social mediamarketingcampaignsformobileandsocialapps v1.1
Social mediamarketingcampaignsformobileandsocialapps v1.1
 
Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2Social Media Marketing Campaigns Part 2
Social Media Marketing Campaigns Part 2
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
 
The iPad Classroom
The iPad ClassroomThe iPad Classroom
The iPad Classroom
 
KSDE video games in the classroom
KSDE video games in the classroomKSDE video games in the classroom
KSDE video games in the classroom
 

Ähnlich wie BW Tech Meetup: Hadoop and The rise of Big Data

Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 

Ähnlich wie BW Tech Meetup: Hadoop and The rise of Big Data (20)

Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Anju
AnjuAnju
Anju
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop10 concepts the enterprise decision maker needs to understand about Hadoop
10 concepts the enterprise decision maker needs to understand about Hadoop
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 

Mehr von Mindgrub Technologies

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassMindgrub Technologies
 
Mobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMindgrub Technologies
 
Submission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsSubmission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsMindgrub Technologies
 
How to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationHow to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationMindgrub Technologies
 
ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)Mindgrub Technologies
 
Top 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceTop 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceMindgrub Technologies
 
Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Mindgrub Technologies
 
TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011Mindgrub Technologies
 
The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011Mindgrub Technologies
 
Flash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidFlash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidMindgrub Technologies
 

Mehr von Mindgrub Technologies (20)

Heads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google GlassHeads Up Displays: The Future of Advertising with Google Glass
Heads Up Displays: The Future of Advertising with Google Glass
 
Mobile Gaming: Past Present and Future
Mobile Gaming: Past Present and FutureMobile Gaming: Past Present and Future
Mobile Gaming: Past Present and Future
 
Special Topics in Mobile
Special Topics in MobileSpecial Topics in Mobile
Special Topics in Mobile
 
Mobile Marketing 101
Mobile Marketing 101Mobile Marketing 101
Mobile Marketing 101
 
Mobile Web vs. Native Apps
Mobile Web vs. Native AppsMobile Web vs. Native Apps
Mobile Web vs. Native Apps
 
Submission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web AppsSubmission, Distribution and Analytics of Mobile and Web Apps
Submission, Distribution and Analytics of Mobile and Web Apps
 
Software Development Lifecycle 101
Software Development Lifecycle 101Software Development Lifecycle 101
Software Development Lifecycle 101
 
How to Leverage Mobile For Your Organization
How to Leverage Mobile For Your OrganizationHow to Leverage Mobile For Your Organization
How to Leverage Mobile For Your Organization
 
ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)ISD Project Management Tools (and Mobile Learning)
ISD Project Management Tools (and Mobile Learning)
 
Top 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile WorkforceTop 5 Apps to Facilitate Mobile Workforce
Top 5 Apps to Facilitate Mobile Workforce
 
The Future of eLearning
The Future of eLearningThe Future of eLearning
The Future of eLearning
 
Mobile, Social & Web Marketing
Mobile, Social & Web MarketingMobile, Social & Web Marketing
Mobile, Social & Web Marketing
 
SXSW Interactive 2012: A Recap
SXSW Interactive 2012: A RecapSXSW Interactive 2012: A Recap
SXSW Interactive 2012: A Recap
 
Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)Augmented reality vs voicerecognition v0.6.ppt (1)
Augmented reality vs voicerecognition v0.6.ppt (1)
 
TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011TAG: The Mobile Assassination Game - SXSWi 2011
TAG: The Mobile Assassination Game - SXSWi 2011
 
The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011The Singularity is Here - SXSWi 2011
The Singularity is Here - SXSWi 2011
 
Adobe Flash and Device Central
Adobe Flash and Device CentralAdobe Flash and Device Central
Adobe Flash and Device Central
 
Flash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and AndroidFlash for Blackberry, iPhone and Android
Flash for Blackberry, iPhone and Android
 
Using Google Wave and Buzz
Using Google Wave and BuzzUsing Google Wave and Buzz
Using Google Wave and Buzz
 
Creating a Facebook App
Creating a Facebook AppCreating a Facebook App
Creating a Facebook App
 

Kürzlich hochgeladen

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

BW Tech Meetup: Hadoop and The rise of Big Data

  • 1. Hadoop and the Rise of Big Data February 21, 2013 Donald Miner @donaldpminer Donald.Miner@emc.com
  • 3. Hadoop • Distributed platform up to thousands of nodes • Data storage and application framework • Started at Yahoo! • Open source • Based on a few Google papers (2003, 2004) • Runs on commodity hardware I’M HERE TO TELL YOU WHY HADOOP IS AWESOME
  • 4. Hadoop users • Yahoo! • Riot Games • Facebook • ComScore • eBay • Twitter • AOL • LinkedIn Hadoop Companies • Cloudera, Hortonworks, EMC/Greenplum, IBM • Numerous startups
  • 5. Buzzword glossary • Unstructured & Structured Data • NoSQL • Big Data (volume, velocity, variety) • Data Science • Cloud computing
  • 6. Hadoop component overview • Core components: – HDFS (Hadoop Distributed File System) – MapReduce (Data analysis framework) • Ecosystem – HBase (key-value store) – Pig (high-level data analysis language) – Hive (SQL-like data analysis language) – ZooKeeper (stores metadata) – Other stuff
  • 7. Use cases • Text processing – Indexing, counting, processing • Large-scale reports • Data science • Mixing data sources (data lakes) • Ad targeting • Image/Video/Audio processing • Cybersecurity
  • 8. HDFS • Stores files in folders (that’s it) – Nobody cares what’s in your files • Chunks large files into blocks (~64MB-1GB) • Blocks are scattered all over the place • 3 replicates of each block (better safe than sorry) • One NameNode (might be sorry) – Knows which computers blocks live on – Knows which blocks belong to which files • One DataNode per computer (slaves!) – Hosts files
  • 10. MapReduce • Analyzes data in HDFS where the data is • Jobs are split into Mappers and Reducers • JobTracker – keeps track of running jobs • TaskTracker – one per computer, executes tasks • Mappers (you code this) – Loads data from HDFS – Filter, transform, parse – Outputs (key, value) pairs • Reducers (you code this, too) – Groups by the mapper’s output key – Aggregate, count, statistics – Outputs to HDFS
  • 12. Hadoop ecosystem • HDFS and MapReduce don’t do everything • Pig – high-level language grpd = GROUP logs BY userAgent; counts = FOREACH grpd GENERATE group, AVG(logs.timeMicroSec)/1.0E+06 AS loadTimeSec; byCount = ORDER counts BY loadTimeSec DESC; top = limit byCount 15; • Hive – high-level SQL language SELECT grp, SUM(col2), COUNT(*) FROM table1 GROUP BY grp; • HBase – key/value store
  • 13. Cool thing #1: Linear Scalability • HDFS and MapReduce scale linearly • If you have twice as many computers, things run twice as fast • If you have twice as much data, things run twice as slow • If you have twice as many computers, you can store twice as much data • This stays true (some minor caveats) • DATA LOCALITY!!
  • 14. Cool thing #2: Schema on Read Before: ETL, schema design, tossing out original data NOW: LOAD DATA  ????  PROFIT!! Data is parsed/interpreted as it is loaded out of HDFS What implications does this have? Keep original data around! Have multiple views of the same data! Store first, figure out what to do with it later!
  • 15. Cool thing #3: Transparent Parallelism RPC? Code deployment? Network programming? Data center fires? Distributed stuff? Inter-process communication? Fault tolerance? Message passing? Threading? Locking? With MapReduce, I DON’T CARE … I just have to fit my solution into this tiny box Solution MapReduce
  • 16. Cool thing #4: Cheap • Commodity hardware (meh) • Open source (people cost more though) • Add more hardware later
  • 17. How to get started • Install Hadoop in a Linux VM – Wait how is this helpful?? Hadoop is distributed! • Use Google (seriously) • Some prerequisites: Java, Linux, Data, Time
  • 18. Stuff Hadoop is good at • Batch processing • Processing lots of data • Outputting lots of data • Storing lots of historical data • Flexible analysis of data • Dealing with unstructured or structured data
  • 19. Stuff Hadoop is not good at • Hadoop is a freight truck, not a sports car • Updating data (think “append-only”) • Being easy to use – Java – Administration • Hadoop is not good storage (don’t throw away your EMC stuff!)
  • 20. QUESTIONS? Hadoop and the Rise of Big Data February 21, 2013 Donald Miner @donaldpminer Donald.Miner@emc.com