SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Marc	
  Cluet	
  –	
  Lynx	
  Consultants	
  
How	
  Hadoop	
  Works	
  
What we’ll cover?
¡  Understand	
  Hadoop	
  in	
  detail	
  
¡  See	
  how	
  Hadoop	
  works	
  operationally	
  
¡  Be	
  able	
  to	
  start	
  asking	
  the	
  right	
  questions	
  from	
  your	
  data	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Distributions
¡  Cloudera	
  CDH	
  
¡  Hortonworks	
  
¡  MapR	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components
¡  HDFS	
  
¡  Hbase	
  
¡  MapRed	
  
¡  YARN	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components
¡  HDFS	
  
§  Hadoop	
  Distributed	
  File	
  System	
  
§  Everything	
  sits	
  on	
  top	
  of	
  it	
  
§  Has	
  3	
  copies	
  by	
  default	
  of	
  every	
  block	
  
¡  Hbase	
  
¡  MapRed	
  
¡  YARN	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components
¡  HDFS	
  
¡  Hbase	
  
§  Hadoop	
  Schemaless	
  Database	
  
§  Key	
  value	
  Store	
  
§  Sits	
  on	
  top	
  of	
  HDFS	
  
¡  MapRed	
  
¡  YARN	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components
¡  HDFS	
  
¡  Hbase	
  
¡  MapRed	
  
§  Hadoop	
  Map/Reduce	
  
§  Non-­‐pluggable,	
  archaic	
  
§  Requires	
  HDFS	
  for	
  temp	
  storage	
  
¡  YARN	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components
¡  HDFS	
  
¡  Hbase	
  
¡  MapRed	
  
¡  YARN	
  
§  Hadoop	
  Map/Reduce	
  version	
  2.0	
  
§  Pluggable,	
  you	
  can	
  add	
  your	
  own	
  
§  Fast	
  and	
  not	
  so	
  much	
  memory	
  hungry	
  	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Component Breakdown
¡  All	
  these	
  components	
  divide	
  themselves	
  in	
  	
  
§  client/server	
  	
  
§  master/slave	
  scenarios	
  
¡  We	
  will	
  now	
  check	
  each	
  individual	
  component	
  breakdown	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  HDFS	
  
§  Master	
  Namenode	
  
▪  Keeps	
  track	
  of	
  all	
  file	
  allocation	
  on	
  Datanodes	
  
▪  Rebalances	
  data	
  if	
  one	
  of	
  the	
  namenodes	
  goes	
  down	
  
▪  Is	
  Rack	
  aware	
  
§  Secondary	
  Namenode	
  
▪  Does	
  cleanup	
  services	
  for	
  the	
  namenode	
  
▪  Not	
  necessarily	
  two	
  different	
  servers	
  
§  Datanode	
  
▪  Stores	
  the	
  data	
  
▪  Good	
  to	
  have	
  not	
  RAID	
  disks	
  for	
  extra	
  I/O	
  speed	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  HDFS	
  
§  How	
  to	
  access	
  
▪  Client	
  can	
  connect	
  with	
  hadoop	
  client	
  to	
  hdfs://namenode:8020	
  
▪  Supports	
  all	
  basic	
  Unix	
  commands	
  
§  Configuration	
  files	
  
▪  /etc/hadoop/conf/core-­‐site.xml	
  
▪  Defines	
  major	
  configuration	
  as	
  hdfs	
  namenode	
  and	
  default	
  parameters	
  
▪  /etc/hadoop/conf/hdfs-­‐site.xml	
  
▪  Defines	
  configuration	
  specific	
  to	
  namenode	
  or	
  datanode	
  on	
  file	
  locations	
  
▪  /etc/hadoop/conf/slaves	
  
▪  Defines	
  the	
  list	
  of	
  servers	
  that	
  are	
  available	
  in	
  this	
  cluster	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  Hbase	
  
§  Master	
  
▪  Controls	
  the	
  Hbase	
  cluster,	
  knows	
  where	
  the	
  data	
  is	
  allocated	
  and	
  
provides	
  a	
  client	
  listening	
  socket	
  using	
  Thrift	
  and/or	
  a	
  RESTful	
  API	
  
§  Regionserver	
  
▪  Hbase	
  node,	
  stores	
  some	
  of	
  the	
  information	
  in	
  one	
  of	
  the	
  regions,	
  
it’d	
  be	
  equivalent	
  to	
  sharding	
  
§  Thrift	
  /	
  REST	
  
▪  Interface	
  to	
  connect	
  to	
  HBase	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  Hbase	
  
§  How	
  to	
  access	
  
▪  Through	
  the	
  Hbase	
  client	
  (using	
  Thrift)	
  
▪  Through	
  the	
  RESTful	
  API	
  
§  Configuration	
  files	
  
▪  /etc/hbase/conf/hbase-­‐site.xml	
  
▪  Defines	
  all	
  the	
  basic	
  configuration	
  for	
  accessing	
  hbase	
  
▪  /etc/hbase/conf/hbase-­‐policy.xml	
  
▪  Defines	
  all	
  the	
  security	
  (ACL)	
  and	
  all	
  the	
  hbase	
  memory	
  tweaks	
  
▪  /etc/hbase/conf/regionservers	
  
▪  List	
  all	
  the	
  regionservers	
  available	
  to	
  this	
  cluster	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  MapRed	
  
§  JobTracker	
  
▪  Creates	
  the	
  Map/Reduce	
  jobs	
  
▪  Stores	
  all	
  the	
  intermediate	
  data	
  
▪  Keeps	
  track	
  of	
  all	
  the	
  previous	
  results	
  through	
  the	
  HistoryServer	
  
§  TaskTracker	
  
▪  Executed	
  Tasks	
  related	
  to	
  the	
  Map/Reduce	
  job	
  
▪  Very	
  CPU	
  and	
  memory	
  intensive	
  
▪  Stores	
  intermediate	
  results	
  which	
  then	
  are	
  pushed	
  to	
  JobTracker	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  MapRed	
  
§  How	
  to	
  access	
  
▪  Through	
  the	
  Hadoop	
  Client	
  
▪  Through	
  any	
  MapRed	
  client	
  like	
  Pig	
  or	
  Hive	
  
▪  Own	
  Java	
  code	
  
§  Configuration	
  files	
  
▪  /etc/hadoop/conf/mapred-­‐site.xml	
  
▪  Defines	
  how	
  to	
  contact	
  this	
  MapRed	
  Cluster	
  
▪  /etc/hadoop/conf/mapred-­‐queue-­‐acls.xml	
  
▪  Defines	
  ACL	
  structure	
  for	
  accessing	
  MapRed,	
  normally	
  not	
  necessary	
  
▪  /etc/hadoop/conf/slaves	
  
▪  Defines	
  the	
  list	
  of	
  TaskTrackers	
  in	
  this	
  cluster	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Components Breakdown
¡  YARN	
  
§  Same	
  structure	
  as	
  MapRed	
  (lives	
  on	
  top	
  of	
  it)	
  
§  Configuration	
  files	
  
▪  /etc/hadoop/conf/yarn-­‐site.xml	
  
▪  All	
  required	
  configuration	
  for	
  YARN	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Cluster Breakdown
¡  Namenode	
  Server	
  
§  HDFS	
  Namenode	
  
§  Hbase	
  Master	
  
¡  Secondary	
  Namenode	
  Server	
  
§  HDFS	
  Secondary	
  Namenode	
  
¡  JobTracker	
  Server	
  
§  MapRed	
  JobTracker	
  
§  MapRed	
  History	
  Server	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Cluster Breakdown
¡  Datanode	
  Server	
  
§  HDFS	
  Datanode	
  
§  Hbase	
  RegionServer	
  
§  MapRed	
  TaskTracker	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Hardware Requirements
¡  Namenode	
  Server	
  
§  Redundant	
  power	
  supplies	
  
§  RAID1	
  Drives	
  
§  Enough	
  memory	
  (16Gb)	
  
¡  Secondary	
  Namenode	
  Server	
  
§  Almost	
  none	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Hardware Requirements
¡  Jobtracker	
  Server	
  
§  Redundant	
  power	
  supplies	
  
§  RAID1	
  Drives	
  
§  Enough	
  memory	
  (16Gb)	
  
¡  Datanode	
  Server	
  
§  Lots	
  of	
  cheap	
  disk	
  (no	
  RAID)	
  
§  Lots	
  of	
  memory	
  (32Gb)	
  
§  Lots	
  of	
  CPU	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Default Ports
¡  HDFS	
  
§  8020:	
  HDFS	
  Namenode	
  
§  50010:	
  HDFS	
  Datanode	
  FS	
  transfer	
  
¡  MapRed	
  
§  No	
  defaults	
  
¡  Hbase	
  
§  60010:	
  Master	
  
§  60020:	
  Regionserver	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop HDFS Workflow
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop MapRed Workflow
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop MapRed Workflow
Lynx	
  Consultants	
  ©	
  2013	
  
Flume
¡  Transports	
  streams	
  of	
  data	
  from	
  point	
  A	
  to	
  point	
  B	
  
¡  Source	
  
§  Where	
  the	
  data	
  is	
  read	
  from	
  
¡  Channel	
  
§  How	
  the	
  data	
  is	
  buffered	
  
¡  Sink	
  
§  Where	
  the	
  data	
  is	
  written	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Flume
¡  Flume	
  is	
  fault	
  tolerant	
  
¡  Sources	
  are	
  pointer	
  kept	
  
§  With	
  some	
  exceptions,	
  but	
  most	
  sources	
  are	
  in	
  a	
  known	
  state	
  
¡  Channels	
  can	
  be	
  fault	
  tolerant	
  
§  Channel	
  written	
  to	
  disk	
  can	
  recover	
  from	
  where	
  it	
  left	
  
¡  Sinks	
  can	
  be	
  redundant	
  
§  More	
  than	
  one	
  sink	
  for	
  the	
  same	
  data	
  
§  Data	
  is	
  serialised	
  and	
  deduplicated	
  using	
  AVRO	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Flume
Lynx	
  Consultants	
  ©	
  2013	
  
Flume
¡  Configuration	
  files	
  
§  /etc/flume-­‐ng/conf/flume.conf	
  
▪  Defines	
  the	
  agent	
  configuration	
  with	
  source,	
  channel,	
  sink	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Flume
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop Recommended Reads
Lynx	
  Consultants	
  ©	
  2013	
  
Hadoop References
¡  Hadoop	
  
§  http://hadoop.apache.org/docs/stable/cluster_setup.html	
  
§  http://rc.cloudera.com/cdh/4/hadoop/hadoop-­‐yarn/hadoop-­‐yarn-­‐site/
ClusterSetup.html	
  
§  http://pig.apache.org/docs/r0.7.0/setup.html	
  
§  http://wiki.apache.org/hadoop/NameNodeFailover	
  
¡  Hbase	
  
§  http://hbase.apache.org/book/book.html	
  
¡  Flume	
  
§  http://archive.cloudera.com/cdh4/cdh/4/flume-­‐ng/
FlumeUserGuide.html	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Questions?
Lynx	
  Consultants	
  ©	
  2013	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Cloudera, Inc.
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction Alexander Alten
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Lucas Jellema
 

Was ist angesagt? (20)

Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)
 

Ähnlich wie Hadoop operations

HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityDinesh Chitlangia
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 Sri Ambati
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedTom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedSri Ambati
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopMarc Cluet
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 

Ähnlich wie Hadoop operations (20)

HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedTom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache Hadoop
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 

Mehr von Marc Cluet

Your Kernel and You
Your Kernel and YouYour Kernel and You
Your Kernel and YouMarc Cluet
 
Managing DevOps teams, staying alive
Managing DevOps teams, staying aliveManaging DevOps teams, staying alive
Managing DevOps teams, staying aliveMarc Cluet
 
The DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyThe DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyMarc Cluet
 
Elastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosElastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosMarc Cluet
 
Service discovery and puppet
Service discovery and puppetService discovery and puppet
Service discovery and puppetMarc Cluet
 
Puppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetPuppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetMarc Cluet
 
Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Marc Cluet
 
Consul First Steps
Consul First StepsConsul First Steps
Consul First StepsMarc Cluet
 
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Marc Cluet
 
Microservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMicroservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMarc Cluet
 
Microservices and the Cloud
Microservices and the CloudMicroservices and the Cloud
Microservices and the CloudMarc Cluet
 
How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservicesMarc Cluet
 
A Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetA Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetMarc Cluet
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best PracticesMarc Cluet
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerMarc Cluet
 
Innovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventInnovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventMarc Cluet
 
Introduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightIntroduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightMarc Cluet
 
Ssh that wonderful thing
Ssh that wonderful thingSsh that wonderful thing
Ssh that wonderful thingMarc Cluet
 
Networking & dns 101
Networking & dns 101Networking & dns 101
Networking & dns 101Marc Cluet
 
Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Marc Cluet
 

Mehr von Marc Cluet (20)

Your Kernel and You
Your Kernel and YouYour Kernel and You
Your Kernel and You
 
Managing DevOps teams, staying alive
Managing DevOps teams, staying aliveManaging DevOps teams, staying alive
Managing DevOps teams, staying alive
 
The DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyThe DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlessly
 
Elastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosElastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptos
 
Service discovery and puppet
Service discovery and puppetService discovery and puppet
Service discovery and puppet
 
Puppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetPuppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and Puppet
 
Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015
 
Consul First Steps
Consul First StepsConsul First Steps
Consul First Steps
 
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
 
Microservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMicroservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff Meetup
 
Microservices and the Cloud
Microservices and the CloudMicroservices and the Cloud
Microservices and the Cloud
 
How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservices
 
A Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetA Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and Puppet
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best Practices
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & Packer
 
Innovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventInnovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich Event
 
Introduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightIntroduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech night
 
Ssh that wonderful thing
Ssh that wonderful thingSsh that wonderful thing
Ssh that wonderful thing
 
Networking & dns 101
Networking & dns 101Networking & dns 101
Networking & dns 101
 
Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)
 

Kürzlich hochgeladen

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Hadoop operations

  • 1. Marc  Cluet  –  Lynx  Consultants   How  Hadoop  Works  
  • 2. What we’ll cover? ¡  Understand  Hadoop  in  detail   ¡  See  how  Hadoop  works  operationally   ¡  Be  able  to  start  asking  the  right  questions  from  your  data   Lynx  Consultants  ©  2013  
  • 3. Hadoop Distributions ¡  Cloudera  CDH   ¡  Hortonworks   ¡  MapR   Lynx  Consultants  ©  2013  
  • 4. Hadoop Components ¡  HDFS   ¡  Hbase   ¡  MapRed   ¡  YARN   Lynx  Consultants  ©  2013  
  • 5. Hadoop Components ¡  HDFS   §  Hadoop  Distributed  File  System   §  Everything  sits  on  top  of  it   §  Has  3  copies  by  default  of  every  block   ¡  Hbase   ¡  MapRed   ¡  YARN   Lynx  Consultants  ©  2013  
  • 6. Hadoop Components ¡  HDFS   ¡  Hbase   §  Hadoop  Schemaless  Database   §  Key  value  Store   §  Sits  on  top  of  HDFS   ¡  MapRed   ¡  YARN   Lynx  Consultants  ©  2013  
  • 7. Hadoop Components ¡  HDFS   ¡  Hbase   ¡  MapRed   §  Hadoop  Map/Reduce   §  Non-­‐pluggable,  archaic   §  Requires  HDFS  for  temp  storage   ¡  YARN   Lynx  Consultants  ©  2013  
  • 8. Hadoop Components ¡  HDFS   ¡  Hbase   ¡  MapRed   ¡  YARN   §  Hadoop  Map/Reduce  version  2.0   §  Pluggable,  you  can  add  your  own   §  Fast  and  not  so  much  memory  hungry     Lynx  Consultants  ©  2013  
  • 9. Hadoop Component Breakdown ¡  All  these  components  divide  themselves  in     §  client/server     §  master/slave  scenarios   ¡  We  will  now  check  each  individual  component  breakdown   Lynx  Consultants  ©  2013  
  • 10. Hadoop Components Breakdown ¡  HDFS   §  Master  Namenode   ▪  Keeps  track  of  all  file  allocation  on  Datanodes   ▪  Rebalances  data  if  one  of  the  namenodes  goes  down   ▪  Is  Rack  aware   §  Secondary  Namenode   ▪  Does  cleanup  services  for  the  namenode   ▪  Not  necessarily  two  different  servers   §  Datanode   ▪  Stores  the  data   ▪  Good  to  have  not  RAID  disks  for  extra  I/O  speed   Lynx  Consultants  ©  2013  
  • 11. Hadoop Components Breakdown ¡  HDFS   §  How  to  access   ▪  Client  can  connect  with  hadoop  client  to  hdfs://namenode:8020   ▪  Supports  all  basic  Unix  commands   §  Configuration  files   ▪  /etc/hadoop/conf/core-­‐site.xml   ▪  Defines  major  configuration  as  hdfs  namenode  and  default  parameters   ▪  /etc/hadoop/conf/hdfs-­‐site.xml   ▪  Defines  configuration  specific  to  namenode  or  datanode  on  file  locations   ▪  /etc/hadoop/conf/slaves   ▪  Defines  the  list  of  servers  that  are  available  in  this  cluster   Lynx  Consultants  ©  2013  
  • 12. Hadoop Components Breakdown ¡  Hbase   §  Master   ▪  Controls  the  Hbase  cluster,  knows  where  the  data  is  allocated  and   provides  a  client  listening  socket  using  Thrift  and/or  a  RESTful  API   §  Regionserver   ▪  Hbase  node,  stores  some  of  the  information  in  one  of  the  regions,   it’d  be  equivalent  to  sharding   §  Thrift  /  REST   ▪  Interface  to  connect  to  HBase   Lynx  Consultants  ©  2013  
  • 13. Hadoop Components Breakdown ¡  Hbase   §  How  to  access   ▪  Through  the  Hbase  client  (using  Thrift)   ▪  Through  the  RESTful  API   §  Configuration  files   ▪  /etc/hbase/conf/hbase-­‐site.xml   ▪  Defines  all  the  basic  configuration  for  accessing  hbase   ▪  /etc/hbase/conf/hbase-­‐policy.xml   ▪  Defines  all  the  security  (ACL)  and  all  the  hbase  memory  tweaks   ▪  /etc/hbase/conf/regionservers   ▪  List  all  the  regionservers  available  to  this  cluster   Lynx  Consultants  ©  2013  
  • 14. Hadoop Components Breakdown ¡  MapRed   §  JobTracker   ▪  Creates  the  Map/Reduce  jobs   ▪  Stores  all  the  intermediate  data   ▪  Keeps  track  of  all  the  previous  results  through  the  HistoryServer   §  TaskTracker   ▪  Executed  Tasks  related  to  the  Map/Reduce  job   ▪  Very  CPU  and  memory  intensive   ▪  Stores  intermediate  results  which  then  are  pushed  to  JobTracker   Lynx  Consultants  ©  2013  
  • 15. Hadoop Components Breakdown ¡  MapRed   §  How  to  access   ▪  Through  the  Hadoop  Client   ▪  Through  any  MapRed  client  like  Pig  or  Hive   ▪  Own  Java  code   §  Configuration  files   ▪  /etc/hadoop/conf/mapred-­‐site.xml   ▪  Defines  how  to  contact  this  MapRed  Cluster   ▪  /etc/hadoop/conf/mapred-­‐queue-­‐acls.xml   ▪  Defines  ACL  structure  for  accessing  MapRed,  normally  not  necessary   ▪  /etc/hadoop/conf/slaves   ▪  Defines  the  list  of  TaskTrackers  in  this  cluster   Lynx  Consultants  ©  2013  
  • 16. Hadoop Components Breakdown ¡  YARN   §  Same  structure  as  MapRed  (lives  on  top  of  it)   §  Configuration  files   ▪  /etc/hadoop/conf/yarn-­‐site.xml   ▪  All  required  configuration  for  YARN   Lynx  Consultants  ©  2013  
  • 17. Hadoop Cluster Breakdown ¡  Namenode  Server   §  HDFS  Namenode   §  Hbase  Master   ¡  Secondary  Namenode  Server   §  HDFS  Secondary  Namenode   ¡  JobTracker  Server   §  MapRed  JobTracker   §  MapRed  History  Server   Lynx  Consultants  ©  2013  
  • 18. Hadoop Cluster Breakdown ¡  Datanode  Server   §  HDFS  Datanode   §  Hbase  RegionServer   §  MapRed  TaskTracker   Lynx  Consultants  ©  2013  
  • 19. Hadoop Hardware Requirements ¡  Namenode  Server   §  Redundant  power  supplies   §  RAID1  Drives   §  Enough  memory  (16Gb)   ¡  Secondary  Namenode  Server   §  Almost  none   Lynx  Consultants  ©  2013  
  • 20. Hadoop Hardware Requirements ¡  Jobtracker  Server   §  Redundant  power  supplies   §  RAID1  Drives   §  Enough  memory  (16Gb)   ¡  Datanode  Server   §  Lots  of  cheap  disk  (no  RAID)   §  Lots  of  memory  (32Gb)   §  Lots  of  CPU   Lynx  Consultants  ©  2013  
  • 21. Hadoop Default Ports ¡  HDFS   §  8020:  HDFS  Namenode   §  50010:  HDFS  Datanode  FS  transfer   ¡  MapRed   §  No  defaults   ¡  Hbase   §  60010:  Master   §  60020:  Regionserver   Lynx  Consultants  ©  2013  
  • 22. Hadoop HDFS Workflow Lynx  Consultants  ©  2013  
  • 23. Hadoop MapRed Workflow Lynx  Consultants  ©  2013  
  • 24. Hadoop MapRed Workflow Lynx  Consultants  ©  2013  
  • 25. Flume ¡  Transports  streams  of  data  from  point  A  to  point  B   ¡  Source   §  Where  the  data  is  read  from   ¡  Channel   §  How  the  data  is  buffered   ¡  Sink   §  Where  the  data  is  written   Lynx  Consultants  ©  2013  
  • 26. Flume ¡  Flume  is  fault  tolerant   ¡  Sources  are  pointer  kept   §  With  some  exceptions,  but  most  sources  are  in  a  known  state   ¡  Channels  can  be  fault  tolerant   §  Channel  written  to  disk  can  recover  from  where  it  left   ¡  Sinks  can  be  redundant   §  More  than  one  sink  for  the  same  data   §  Data  is  serialised  and  deduplicated  using  AVRO   Lynx  Consultants  ©  2013  
  • 28. Flume ¡  Configuration  files   §  /etc/flume-­‐ng/conf/flume.conf   ▪  Defines  the  agent  configuration  with  source,  channel,  sink   Lynx  Consultants  ©  2013  
  • 30. Hadoop Recommended Reads Lynx  Consultants  ©  2013  
  • 31. Hadoop References ¡  Hadoop   §  http://hadoop.apache.org/docs/stable/cluster_setup.html   §  http://rc.cloudera.com/cdh/4/hadoop/hadoop-­‐yarn/hadoop-­‐yarn-­‐site/ ClusterSetup.html   §  http://pig.apache.org/docs/r0.7.0/setup.html   §  http://wiki.apache.org/hadoop/NameNodeFailover   ¡  Hbase   §  http://hbase.apache.org/book/book.html   ¡  Flume   §  http://archive.cloudera.com/cdh4/cdh/4/flume-­‐ng/ FlumeUserGuide.html   Lynx  Consultants  ©  2013