SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
© 2011 Radiant Advisors, All Rights Reserved.   1
© 2011 Radiant Advisors, All Rights Reserved.   2
© 2011 Radiant Advisors, All Rights Reserved.   3
© 2011 Radiant Advisors, All Rights Reserved.   4
© 2011 Radiant Advisors, All Rights Reserved.   5
Go check out: Data Processing with Hadoop: Scalable and Cost
Effective, Doug Cutting, Apache Hadoop Co-founder, April 26th, 2011

This is the keynote presentation from Chicago Data Summit. Doug
Cutting takes us through the creation of Apache Hadoop, Hadoop's
adoption, the key advantages of Hadoop, and answers several
questions from attendees.
	
  
http://www.cloudera.com/videos/
chicago_data_summit_keynote_data_processing_with_hadoop_scalab
le_and_cost_effective_doug_cutting_apache_hadoop_co-
founder_hadoop




          © 2011 Radiant Advisors, All Rights Reserved.               6
http://hadoop.apache.org/

The project includes these subprojects:

•  Hadoop Common: The common utilities that support the other Hadoop
   subprojects.
•  Hadoop Distributed File System (HDFS™): A distributed file system that provides
   high-throughput access to application data.
•  Hadoop MapReduce: A software framework for distributed processing of large
   data sets on compute clusters.

Other Hadoop-related projects at Apache include:

•    Avro™: A data serialization system.
•    Cassandra™: A scalable multi-master database with no single points of failure.
•    Chukwa™: A data collection system for managing large distributed systems.
•    HBase™: A scalable, distributed database that supports structured data storage for
     large tables.
•    Hive™: A data warehouse infrastructure that provides data summarization and ad
     hoc querying.
•    Mahout™: A Scalable machine learning and data mining library.
•    Pig™: A high-level data-flow language and execution framework for parallel
     computation.
•    ZooKeeper™: A high-performance coordination service for distributed applications.




              © 2011 Radiant Advisors, All Rights Reserved.                               7
Reference: http://en.wikipedia.org/wiki/Apache_Hadoop




         © 2011 Radiant Advisors, All Rights Reserved.   8
Reference: Hadoop in Action, Chuck Lam, Manning Publications 2011.

Hadoop cluster is a set of commodity machines networked together in
one location.

While not strictly necessary, machines in a Hadoop cluster are usually
relatively homogeneous x86 Linux boxes.

And they’re almost always located in the same data center, often in
the same rack.

Data storage and processing all occur with this “cloud” of machines.

Different users can submit computing “jobs” to Hadoop from individual
clients.




          © 2011 Radiant Advisors, All Rights Reserved.                  9
© 2011 Radiant Advisors, All Rights Reserved.   10
© 2011 Radiant Advisors, All Rights Reserved.   11
Reference Information Week: Charles Babcock 06/22/2010 Designed
for cloud computing, the Hadoop data management system handles
petabytes of data at a time, pairing Google's MapReduce with a
distributed file management system for use on large clusters.
	
  
Image Gallery: Yahoo's Hadoop Implementation

http://www.informationweek.com/news/galleries/software/
info_management/225700411?pgno=1




         © 2011 Radiant Advisors, All Rights Reserved.            12
© 2011 Radiant Advisors, All Rights Reserved.   13
© 2011 Radiant Advisors, All Rights Reserved.   14
© 2011 Radiant Advisors, All Rights Reserved.   15
© 2011 Radiant Advisors, All Rights Reserved.   16
© 2011 Radiant Advisors, All Rights Reserved.   17
© 2011 Radiant Advisors, All Rights Reserved.   18
© 2011 Radiant Advisors, All Rights Reserved.   19
© 2011 Radiant Advisors, All Rights Reserved.   20
© 2011 Radiant Advisors, All Rights Reserved.   21
© 2011 Radiant Advisors, All Rights Reserved.   22
© 2011 Radiant Advisors, All Rights Reserved.   23
© 2011 Radiant Advisors, All Rights Reserved.   24
© 2011 Radiant Advisors, All Rights Reserved.   25
© 2011 Radiant Advisors, All Rights Reserved.   26
© 2011 Radiant Advisors, All Rights Reserved.   27
© 2011 Radiant Advisors, All Rights Reserved.   28
© 2011 Radiant Advisors, All Rights Reserved.   29
http://www.informationweek.com/news/galleries/software/
info_management/225700411?pgno=8

Pig Parallel Programming Language
Olga Natkovich, Pig engineering manager, and Alan Gates, Pig lead
architect and a Pig contributor. Pig is a parallel programming language
developed by Yahoo Research, the firm's central research unit, which
allows Yahoo to easily perform procedural data processing tasks on top
of Hadoop. It is the standard pipeline processing solution at Yahoo!

SQL Example:
------------------------------------------------------------------------------
SELECT user, COUNT(*) FROM excite-small.log GROUP BY user;
------------------------------------------------------------------------------

In Pig becomes;
------------------------------------------------------------------------------
log = LOAD ‘excite-small.log’ AS (user, time, query);

grpd = GROUP log BY user;




               © 2011 Radiant Advisors, All Rights Reserved.                     30
Apache Hive Page:

Hive is a data warehouse system for Hadoop that facilitates easy data
summarization, ad-hoc queries, and the analysis of large datasets
stored in Hadoop compatible file systems. Hive provides a mechanism
to project structure onto this data and query the data using a SQL-like
language called HiveQL. At the same time this language also allows
traditional map/reduce programmers to plug in their custom mappers
and reducers when it is inconvenient or inefficient to express this logic in
HiveQL.




          © 2011 Radiant Advisors, All Rights Reserved.                        31
Apache HBasePage: http://hbase.apache.org/
	
  
	
  
	
  




        © 2011 Radiant Advisors, All Rights Reserved.   32
© 2011 Radiant Advisors, All Rights Reserved.   33
http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
Sanjay Sharma’s Weblog
August 16, 2010
Hadoop Ecosystem World-Map
While preparing for the keynote for the  recently held HUG India meetup on 31st July, I
decided that I will try to keep my session short, but useful and relevant to the lined up
sesssions on hiho, JAQL and Visual hive. I have always been a keen student of
geography (still take pride in it!) and thought it would be great to draw a visual
geographical map of Hadoop ecosystem. Here is what I came up with a little nice
story behind it-
1. How did it all start- huge data on the web!
2. Nutch built to crawl this web data
3. Huge data had to saved- HDFS was born!
4. How to use this data?
5. Map reduce framework built for coding and running analytics – java, any
language-streaming/pipes
6. How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs 
– fuse,webdav, chukwa, flume, Scribe
7. Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band
wagon!
8. High level interfaces required over low level map reduce programming– Pig, Hive,
Jaql
9. BI tools with advanced UI reporting- drilldown etc- Intellicus 
10. Workflow tools over Map-Reduce processes and High level languages
11. Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue,
karmasphere, eclipse plugin, cacti, ganglia
12. Support frameworks- Avro (Serialization), Zookeeper (Coordination)
13. More High level interfaces/uses- Mahout, Elastic map Reduce
14.  OLTP- also possible – Hbase




            © 2011 Radiant Advisors, All Rights Reserved.                                   34
© 2011 Radiant Advisors, All Rights Reserved.   35
© 2011 Radiant Advisors, All Rights Reserved.   36
© 2011 Radiant Advisors, All Rights Reserved.   37

Weitere ähnliche Inhalte

Was ist angesagt?

Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataDataWorks Summit
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Edureka!
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 

Was ist angesagt? (20)

Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Hadoop
HadoopHadoop
Hadoop
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 

Andere mochten auch

Peercoachingpk
PeercoachingpkPeercoachingpk
Peercoachingpktina pratt
 
SISSA: new Director elected
SISSA:	new	Director	electedSISSA:	new	Director	elected
SISSA: new Director electedRene Kotze
 
SOAInstitute.org On-Demand Training FAQs
SOAInstitute.org On-Demand Training FAQsSOAInstitute.org On-Demand Training FAQs
SOAInstitute.org On-Demand Training FAQsSOAInstitute
 
Disable back button pada browser using javascript
Disable back button pada browser using javascriptDisable back button pada browser using javascript
Disable back button pada browser using javascriptMas Eko
 
Faculty Presentation
Faculty Presentation Faculty Presentation
Faculty Presentation tina pratt
 
Final Sara Jane Pisano
Final Sara Jane PisanoFinal Sara Jane Pisano
Final Sara Jane PisanoSaraJanePisano
 
Ardian ibp presentation_25012012
Ardian ibp presentation_25012012Ardian ibp presentation_25012012
Ardian ibp presentation_25012012Ardian Setiawan
 
梦幻蓝色3 d艺术
梦幻蓝色3 d艺术梦幻蓝色3 d艺术
梦幻蓝色3 d艺术bright_season
 
Sribu.com - Product Presentation
Sribu.com - Product PresentationSribu.com - Product Presentation
Sribu.com - Product PresentationRyan Gondokusumo
 
Tugas sedimen transport
Tugas sedimen transportTugas sedimen transport
Tugas sedimen transportVeri Yulianto
 
Staff development day - student feedback
Staff development day - student feedbackStaff development day - student feedback
Staff development day - student feedbacktina pratt
 
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...Rene Kotze
 
NITheP Computatonal \
NITheP Computatonal \NITheP Computatonal \
NITheP Computatonal \Rene Kotze
 
2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic report2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic reportRene Kotze
 

Andere mochten auch (17)

Peercoachingpk
PeercoachingpkPeercoachingpk
Peercoachingpk
 
SISSA: new Director elected
SISSA:	new	Director	electedSISSA:	new	Director	elected
SISSA: new Director elected
 
荷塘春绿 植物
荷塘春绿 植物荷塘春绿 植物
荷塘春绿 植物
 
Sharepoint pk
Sharepoint pkSharepoint pk
Sharepoint pk
 
SOAInstitute.org On-Demand Training FAQs
SOAInstitute.org On-Demand Training FAQsSOAInstitute.org On-Demand Training FAQs
SOAInstitute.org On-Demand Training FAQs
 
Disable back button pada browser using javascript
Disable back button pada browser using javascriptDisable back button pada browser using javascript
Disable back button pada browser using javascript
 
Faculty Presentation
Faculty Presentation Faculty Presentation
Faculty Presentation
 
Final Sara Jane Pisano
Final Sara Jane PisanoFinal Sara Jane Pisano
Final Sara Jane Pisano
 
Ardian ibp presentation_25012012
Ardian ibp presentation_25012012Ardian ibp presentation_25012012
Ardian ibp presentation_25012012
 
梦幻蓝色3 d艺术
梦幻蓝色3 d艺术梦幻蓝色3 d艺术
梦幻蓝色3 d艺术
 
Sribu.com - Product Presentation
Sribu.com - Product PresentationSribu.com - Product Presentation
Sribu.com - Product Presentation
 
Tugas sedimen transport
Tugas sedimen transportTugas sedimen transport
Tugas sedimen transport
 
Staff development day - student feedback
Staff development day - student feedbackStaff development day - student feedback
Staff development day - student feedback
 
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...
NITheP UKZN Seminar: Prof. Alexander Gorokhov (Samara State University, Russi...
 
NITheP Computatonal \
NITheP Computatonal \NITheP Computatonal \
NITheP Computatonal \
 
2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic report2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic report
 
William faulkner
William faulknerWilliam faulkner
William faulkner
 

Ähnlich wie Dallas TDWI Meeting Dec. 2012: Hadoop

Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesJanBask Training
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 

Ähnlich wie Dallas TDWI Meeting Dec. 2012: Hadoop (20)

Data analytics
Data analyticsData analytics
Data analytics
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Dallas TDWI Meeting Dec. 2012: Hadoop

  • 1. © 2011 Radiant Advisors, All Rights Reserved. 1
  • 2. © 2011 Radiant Advisors, All Rights Reserved. 2
  • 3. © 2011 Radiant Advisors, All Rights Reserved. 3
  • 4. © 2011 Radiant Advisors, All Rights Reserved. 4
  • 5. © 2011 Radiant Advisors, All Rights Reserved. 5
  • 6. Go check out: Data Processing with Hadoop: Scalable and Cost Effective, Doug Cutting, Apache Hadoop Co-founder, April 26th, 2011 This is the keynote presentation from Chicago Data Summit. Doug Cutting takes us through the creation of Apache Hadoop, Hadoop's adoption, the key advantages of Hadoop, and answers several questions from attendees.   http://www.cloudera.com/videos/ chicago_data_summit_keynote_data_processing_with_hadoop_scalab le_and_cost_effective_doug_cutting_apache_hadoop_co- founder_hadoop © 2011 Radiant Advisors, All Rights Reserved. 6
  • 7. http://hadoop.apache.org/ The project includes these subprojects: •  Hadoop Common: The common utilities that support the other Hadoop subprojects. •  Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. •  Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. Other Hadoop-related projects at Apache include: •  Avro™: A data serialization system. •  Cassandra™: A scalable multi-master database with no single points of failure. •  Chukwa™: A data collection system for managing large distributed systems. •  HBase™: A scalable, distributed database that supports structured data storage for large tables. •  Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. •  Mahout™: A Scalable machine learning and data mining library. •  Pig™: A high-level data-flow language and execution framework for parallel computation. •  ZooKeeper™: A high-performance coordination service for distributed applications. © 2011 Radiant Advisors, All Rights Reserved. 7
  • 8. Reference: http://en.wikipedia.org/wiki/Apache_Hadoop © 2011 Radiant Advisors, All Rights Reserved. 8
  • 9. Reference: Hadoop in Action, Chuck Lam, Manning Publications 2011. Hadoop cluster is a set of commodity machines networked together in one location. While not strictly necessary, machines in a Hadoop cluster are usually relatively homogeneous x86 Linux boxes. And they’re almost always located in the same data center, often in the same rack. Data storage and processing all occur with this “cloud” of machines. Different users can submit computing “jobs” to Hadoop from individual clients. © 2011 Radiant Advisors, All Rights Reserved. 9
  • 10. © 2011 Radiant Advisors, All Rights Reserved. 10
  • 11. © 2011 Radiant Advisors, All Rights Reserved. 11
  • 12. Reference Information Week: Charles Babcock 06/22/2010 Designed for cloud computing, the Hadoop data management system handles petabytes of data at a time, pairing Google's MapReduce with a distributed file management system for use on large clusters.   Image Gallery: Yahoo's Hadoop Implementation http://www.informationweek.com/news/galleries/software/ info_management/225700411?pgno=1 © 2011 Radiant Advisors, All Rights Reserved. 12
  • 13. © 2011 Radiant Advisors, All Rights Reserved. 13
  • 14. © 2011 Radiant Advisors, All Rights Reserved. 14
  • 15. © 2011 Radiant Advisors, All Rights Reserved. 15
  • 16. © 2011 Radiant Advisors, All Rights Reserved. 16
  • 17. © 2011 Radiant Advisors, All Rights Reserved. 17
  • 18. © 2011 Radiant Advisors, All Rights Reserved. 18
  • 19. © 2011 Radiant Advisors, All Rights Reserved. 19
  • 20. © 2011 Radiant Advisors, All Rights Reserved. 20
  • 21. © 2011 Radiant Advisors, All Rights Reserved. 21
  • 22. © 2011 Radiant Advisors, All Rights Reserved. 22
  • 23. © 2011 Radiant Advisors, All Rights Reserved. 23
  • 24. © 2011 Radiant Advisors, All Rights Reserved. 24
  • 25. © 2011 Radiant Advisors, All Rights Reserved. 25
  • 26. © 2011 Radiant Advisors, All Rights Reserved. 26
  • 27. © 2011 Radiant Advisors, All Rights Reserved. 27
  • 28. © 2011 Radiant Advisors, All Rights Reserved. 28
  • 29. © 2011 Radiant Advisors, All Rights Reserved. 29
  • 30. http://www.informationweek.com/news/galleries/software/ info_management/225700411?pgno=8 Pig Parallel Programming Language Olga Natkovich, Pig engineering manager, and Alan Gates, Pig lead architect and a Pig contributor. Pig is a parallel programming language developed by Yahoo Research, the firm's central research unit, which allows Yahoo to easily perform procedural data processing tasks on top of Hadoop. It is the standard pipeline processing solution at Yahoo! SQL Example: ------------------------------------------------------------------------------ SELECT user, COUNT(*) FROM excite-small.log GROUP BY user; ------------------------------------------------------------------------------ In Pig becomes; ------------------------------------------------------------------------------ log = LOAD ‘excite-small.log’ AS (user, time, query); grpd = GROUP log BY user; © 2011 Radiant Advisors, All Rights Reserved. 30
  • 31. Apache Hive Page: Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. © 2011 Radiant Advisors, All Rights Reserved. 31
  • 32. Apache HBasePage: http://hbase.apache.org/       © 2011 Radiant Advisors, All Rights Reserved. 32
  • 33. © 2011 Radiant Advisors, All Rights Reserved. 33
  • 34. http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/ Sanjay Sharma’s Weblog August 16, 2010 Hadoop Ecosystem World-Map While preparing for the keynote for the  recently held HUG India meetup on 31st July, I decided that I will try to keep my session short, but useful and relevant to the lined up sesssions on hiho, JAQL and Visual hive. I have always been a keen student of geography (still take pride in it!) and thought it would be great to draw a visual geographical map of Hadoop ecosystem. Here is what I came up with a little nice story behind it- 1. How did it all start- huge data on the web! 2. Nutch built to crawl this web data 3. Huge data had to saved- HDFS was born! 4. How to use this data? 5. Map reduce framework built for coding and running analytics – java, any language-streaming/pipes 6. How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs  – fuse,webdav, chukwa, flume, Scribe 7. Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon! 8. High level interfaces required over low level map reduce programming– Pig, Hive, Jaql 9. BI tools with advanced UI reporting- drilldown etc- Intellicus  10. Workflow tools over Map-Reduce processes and High level languages 11. Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia 12. Support frameworks- Avro (Serialization), Zookeeper (Coordination) 13. More High level interfaces/uses- Mahout, Elastic map Reduce 14.  OLTP- also possible – Hbase © 2011 Radiant Advisors, All Rights Reserved. 34
  • 35. © 2011 Radiant Advisors, All Rights Reserved. 35
  • 36. © 2011 Radiant Advisors, All Rights Reserved. 36
  • 37. © 2011 Radiant Advisors, All Rights Reserved. 37