SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Adam	
  Muise	
  –	
  Solu/on	
  Architect,	
  Hortonworks	
  

HADOOP	
  101:	
  

AN	
  INTRODUCTION	
  TO	
  HADOOP	
  WITH	
  THE	
  
HORTONWORKS	
  SANDBOX	
  
Who	
  are	
  we?	
  
Who	
  is	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?	
  
100%	
  Open	
  Source	
  –	
  
Democra/zed	
  Access	
  to	
  
Data	
  

The	
  leaders	
  of	
  Hadoop’s	
  
development	
  

We	
  do	
  Hadoop	
  
Drive	
  Innova/on	
  in	
  
the	
  plaForm	
  –	
  We	
  
lead	
  the	
  roadmap	
  	
  
Community	
  driven,	
  	
  
Enterprise	
  Focused	
  
We	
  do	
  Hadoop	
  successfully.	
  
Support	
  	
  
Training	
  
Professional	
  Services	
  
Enter	
  the	
  Hadoop.	
  

………	
  
hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/	
  
Hadoop	
  was	
  created	
  because	
  
tradi/onal	
  technologies	
  never	
  cut	
  it	
  
for	
  the	
  Internet	
  proper/es	
  like	
  
Google,	
  Yahoo,	
  Facebook,	
  TwiOer,	
  
and	
  LinkedIn	
  
Tradi/onal	
  architecture	
  didn’t	
  
scale	
  enough…	
  
App	
   App	
   App	
   App	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

DB	
   DB	
  
DB	
  
SAN	
  
Databases	
  can	
  become	
  bloated	
  
and	
  useless	
  
$upercompu/ng	
  

Tradi/onal	
  architectures	
  cost	
  too	
  
much	
  at	
  that	
  volume…	
  

$/TB	
  

$pecial	
  
Hardware	
  
So	
  what	
  is	
  the	
  answer?	
  
If	
  you	
  could	
  design	
  a	
  system	
  that	
  
would	
  handle	
  this,	
  what	
  would	
  it	
  
look	
  like?	
  
It	
  would	
  probably	
  need	
  a	
  highly	
  
resilient,	
  self-­‐healing,	
  cost-­‐efficient,	
  
distributed	
  file	
  system…	
  
Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  
It	
  would	
  probably	
  need	
  a	
  completely	
  
parallel	
  processing	
  framework	
  that	
  
took	
  tasks	
  to	
  the	
  data…	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  run	
  on	
  commodity	
  
hardware,	
  virtualized	
  machines,	
  and	
  
common	
  OS	
  plaForms	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  be	
  open	
  source	
  so	
  
innova/on	
  could	
  happen	
  as	
  quickly	
  
as	
  possible	
  
It	
  would	
  need	
  a	
  cri/cal	
  mass	
  of	
  
users	
  
Tez	
  

Storm	
  

YARN	
  

Pig	
  

HDFS	
  

MapReduce	
  

Apache	
  Hadoop	
  

HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Knox	
  

Sqoop	
  

Falcon	
  
Flume	
  
Storm	
  

Tez	
  
Pig	
  

YARN	
  

HDFS	
  

MapReduce	
  

Hortonworks	
  Data	
  PlaForm	
  
HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Knox	
  

Sqoop	
  

Falcon	
  
Flume	
  
We	
  are	
  going	
  to	
  learn	
  how	
  to	
  work	
  
with	
  Hadoop	
  in	
  less	
  than	
  an	
  hour.	
  
To	
  do	
  this,	
  we	
  need	
  to	
  install	
  
Hadoop	
  right?	
  
Nope.	
  
Enter	
  the	
  
	
  
	
  
	
  
Sandbox.	
  
The	
  Sandbox	
  is	
  ‘Hadoop	
  in	
  a	
  Can’.	
  
It	
  contains	
  one	
  copy	
  of	
  each	
  of	
  the	
  
Master	
  and	
  Worker	
  node	
  processes	
  
used	
  in	
  a	
  cluster,	
  only	
  in	
  a	
  single	
  
virtual	
  node.	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  

Processing	
  
Storage	
  
Linux	
  VM	
  

Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Gefng	
  started	
  with	
  Sandbox	
  VM:	
  
	
  
-­‐	
  Pick	
  your	
  flavor	
  of	
  VM	
  at…	
  

	
  hOp://www.hortonworks.com/sandbox	
  
-­‐	
  Start	
  the	
  sandbox	
  VM	
  
-­‐	
  find	
  the	
  IP	
  displayed	
  	
  	
  
-­‐	
  go	
  to…	
  
	
  hOp://172.16.130.131	
  	
  
-­‐	
  Register	
  
-­‐	
  Click	
  on	
  ‘Start	
  Tutorials’	
  
-­‐	
  On	
  the	
  lek	
  hand	
  nav,	
  click	
  on	
  ‘HCatalog,	
  Basic	
  Pig	
  
	
  &	
  Hive	
  Commands’	
  	
  
In	
  this	
  tutorial	
  we	
  will:	
  
-­‐	
  Land	
  files	
  in	
  HDFS	
  
-­‐	
  Assign	
  metadata	
  with	
  HCatalog	
  
-­‐	
  Use	
  SQL	
  with	
  Hive	
  
-­‐	
  Learn	
  to	
  process	
  data	
  with	
  Pig	
  
Try	
  the	
  other	
  tutorials.	
  
Hadoop	
  is	
  the	
  new	
  Modern	
  Data	
  
Architecture	
  for	
  the	
  Enterprise	
  
There is NO second place

Hortonworks	
  

…the	
  Bull	
  Elephant	
  of	
  Hadoop	
  InnovaGon	
  
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION

Page	
  29	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
Yahoo Developer Network
 

Was ist angesagt? (20)

Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 

Andere mochten auch

Njug presentation
Njug presentationNjug presentation
Njug presentation
iwrigley
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Andere mochten auch (20)

hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 
Njug presentation
Njug presentationNjug presentation
Njug presentation
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Hadoop 101 v1
Hadoop 101 v1Hadoop 101 v1
Hadoop 101 v1
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Ähnlich wie 2014 feb 24_big_datacongress_hadoopsession1_hadoop101

Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
lamont_lockwood
 

Ähnlich wie 2014 feb 24_big_datacongress_hadoopsession1_hadoop101 (20)

Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Big data Analytics hands-on sessions
Big data Analytics hands-on sessionsBig data Analytics hands-on sessions
Big data Analytics hands-on sessions
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 

Mehr von Adam Muise

KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 

Mehr von Adam Muise (12)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

  • 1. Adam  Muise  –  Solu/on  Architect,  Hortonworks   HADOOP  101:   AN  INTRODUCTION  TO  HADOOP  WITH  THE   HORTONWORKS  SANDBOX  
  • 3. Who  is                                        ?  
  • 4. 100%  Open  Source  –   Democra/zed  Access  to   Data   The  leaders  of  Hadoop’s   development   We  do  Hadoop   Drive  Innova/on  in   the  plaForm  –  We   lead  the  roadmap     Community  driven,     Enterprise  Focused  
  • 5. We  do  Hadoop  successfully.   Support     Training   Professional  Services  
  • 6. Enter  the  Hadoop.   ………   hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  
  • 7. Hadoop  was  created  because   tradi/onal  technologies  never  cut  it   for  the  Internet  proper/es  like   Google,  Yahoo,  Facebook,  TwiOer,   and  LinkedIn  
  • 8. Tradi/onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  
  • 9. Databases  can  become  bloated   and  useless  
  • 10. $upercompu/ng   Tradi/onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  
  • 11. So  what  is  the  answer?  
  • 12. If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  
  • 13. It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  
  • 14. It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 15. It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaForms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 16. It  would  probably  be  open  source  so   innova/on  could  happen  as  quickly   as  possible  
  • 17. It  would  need  a  cri/cal  mass  of   users  
  • 18. Tez   Storm   YARN   Pig   HDFS   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • 19. Storm   Tez   Pig   YARN   HDFS   MapReduce   Hortonworks  Data  PlaForm   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • 20. We  are  going  to  learn  how  to  work   with  Hadoop  in  less  than  an  hour.  
  • 21. To  do  this,  we  need  to  install   Hadoop  right?  
  • 23. Enter  the         Sandbox.  
  • 24. The  Sandbox  is  ‘Hadoop  in  a  Can’.   It  contains  one  copy  of  each  of  the   Master  and  Worker  node  processes   used  in  a  cluster,  only  in  a  single   virtual  node.   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Storage   Linux  VM   Processing   Processing  Processing   Storage   Storage   Storage  
  • 25. Gefng  started  with  Sandbox  VM:     -­‐  Pick  your  flavor  of  VM  at…    hOp://www.hortonworks.com/sandbox   -­‐  Start  the  sandbox  VM   -­‐  find  the  IP  displayed       -­‐  go  to…    hOp://172.16.130.131     -­‐  Register   -­‐  Click  on  ‘Start  Tutorials’   -­‐  On  the  lek  hand  nav,  click  on  ‘HCatalog,  Basic  Pig    &  Hive  Commands’    
  • 26. In  this  tutorial  we  will:   -­‐  Land  files  in  HDFS   -­‐  Assign  metadata  with  HCatalog   -­‐  Use  SQL  with  Hive   -­‐  Learn  to  process  data  with  Pig  
  • 27. Try  the  other  tutorials.  
  • 28. Hadoop  is  the  new  Modern  Data   Architecture  for  the  Enterprise  
  • 29. There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  InnovaGon   © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  29