SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
PRESENTATION TITLE GOES HEREHadoop 2 : New and Noteworthy
Sujee Maniyam, ElephantScale
sujee@ElephantScale.com
http://ElephantScale.com
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
SNIA Legal Notice
!   The material contained in this tutorial is copyrighted by the SNIA unless
otherwise noted.
!   Member companies and individual members may use this material in
presentations and literature under the following conditions:
!   Any slide or slides used must be reproduced in their entirety without modification
!   The SNIA must be acknowledged as the source of any material used in the body of
any document containing material from these presentations.
!   This presentation is a project of the SNIA Education Committee.
!   Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be, or should be construed as legal advice or an
opinion of counsel. If you need legal advice or a legal opinion please
contact your attorney.
!   The information presented herein represents the author's personal opinion
and current understanding of the relevant issues involved. The author, the
presenter, and the SNIA do not assume any responsibility or liability for
damages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Abstract
!   Hadoop 2 : New And Noteworthy Features
!   This session will appeal to Data Center Managers, Development
Managers, and those that are looking for an overview of ‘whats
new’ in Hadoop 2 platform. The session will highlight some of the
notable features in Hadoop 2.
3
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Quick Poll
!   How many of you are NEW to Hadoop?
!   How many of you are USING Hadoop?
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Timeline
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Versions – J
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Versions – Simplified
Hadoop 1 Hadooop 2
1.2.1 (aug 2013) 2.2.0 : (oct 2013)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Feature Matrix
Component Feature V1 v2
HDFS NameNode High Availability X
Namenode federation X
Snapshots X
NFS v3 access to HDFS X
Improved IO X
Processing MapReduce v1 X
YARN (MapReduce v2) X
Other Kerberos security X X
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
9
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Architecture (V1)
10
Name Node
Data Node Data NodeData NodeData Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Name Node High Availability
!   HDFS has (had) a ONE NameNode/ many Datanode
design
!   This leads to ‘Single Point of Failure’ (SPOF) for Name
Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode Is Very Important In A
Cluster
12
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Is Hadoop NN Failure A Big Deal?
!   At Yahoo study
!   18 month study
!   22 failure on 25 clusters
!   0.58 failures per cluster per year
!   Only half of them would have benefited from HA
!   à 0.23 failure / year / cluster
! http://www.slideshare.net/Hadoop_Summit/hdfs-
namenode-high-availability
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Still Needs To Be Fixed
!   Downtime may be acceptable for batch workloads
!   But not acceptable for running real time workloads like
HBase that depend on HDFS
!   Downtime (even minutes) is not acceptable
!   Make Hadoop more Enterprise friendly
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
How Do We Fix A Single
NameNode Failure?
!   Have two Namenodes !
!   One ACTIVE and another PASSIVE
!   When Active NN fails, Passive one will take over
!   Fail over can be automated
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Architecture (v1)
16
Name Node
Data Node Data NodeData NodeData Node
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA (V2)
17
Name Node
1
(active)
Data Node Data NodeData NodeData Node
Name Node
2
(passive)
Shared
storage
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA : Shared Storage
(c) ElephantScale.com, 2014
18
Name Node
1
(active)
Data Node Data NodeData NodeData Node
Name Node
2
(passive)
Filer
Option 1) external filer
Option 2) Quorum Journal
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Namenode HA
!   Namenode meta data is written to a shared storage
(external filer or Quorum Journal Manager)
!   Only ONE active NN can write to shared storage
!   Passive NN reads and replays meta data from shared
storage
!   When Active NN fails, passive NN is promoted to active
!   Can be manual or automatic
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NameNode HA Setup
20
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
21
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Namenode Federation
!   Namenode stores meta data in memory
!   For large (very large) clusters, NN could exhaust
memory
!   Spread meta-data over mulitiple namenodes
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
!   Now the namespace is divided
!   /hbase à NN1
!   /user à NN2
!   /hive à NN3
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Federation
!   Namespace is partitioned into ‘block pools’
!   Datanodes are shared across cluster
!   They store blocks for different pools
!   Datanodes send heart-beats to all NNs
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
26
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Snapshots
!   Wait, doesn’t HDFS makes replicas?
!   Yes
!   But it doesn’t save you from :
hdfs dfs –rm –r /data
!   ‘Trash’ feature only works for CLI utilities
!   You can delete files using API.. Poof gone
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Snapshots
!   Recover from user errors, other disasters
!   Peroidic snapshots
!   E.g : daily backups… keep them for 15 days
!   Snapshotting is
!   Efficient (no data duplication, copy on write)
!   Fast
!   snapshot part of file system (not the whole thing)
! http://cdn.oreillystatic.com/en/assets/1/event/100/HDFS
%20Snapshots%20and%20Beyond%20Presentation.pdf
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
29
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NFS Access to HDFS
!   HDFS is a userland file system
!   Not a kernel file system
!   So most linux programs can not read/write data to HDFS
!   We use ‘hdfs’ command line utils
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NFS Access to HDFS
!   HDFS supports NFS protocol starting with v2
!   NFS is done via gateway machine
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
NEXT
!   NameNode High Availability
!   Federation
!   Snapshots
!   NFS
!   Improved IO
32
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
HDFS Improved IO
!   Lots of performance fixes from v1 à v2
!   Quick comparison
!   Multi threaded random-read
!   HDFS v1 : 264 MB/sec
!   HDFS v2 : 1395 MB /sec ( 5x !)
Source :
http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-
hadoop-forum
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
V2 Features
! HDFS
!   Processing
!   YARN
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce V1
!   MRV1 proved itself as a reliable batch processing
framework!
!   One Job Tracker (master) and many task tracker
(workers)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce Architecture
36
Job Tracker
Task Tracker Task TrackerTask TrackerTask Tracker
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MRV1 Limitations
!   Only supports one programming paradigm
!   Batch processing
!   Alternate processing is hard to (or not possible)
implement on top of MRV1
!   Real time processing
!   In-memory data
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MRV1 Limitations
!   Single Job Tracker (JT) à single point of failure
!   JT Failure kills all running jobs (and queued jobs)
!   JT started hit scalability limitations for very large clusters
!   4,000 nodes
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Looking Ahead
HDFS
MRV1
1) Processing
2) Resource
management
HDFS
YARN
(resource management)
mapreduce other
Hadoop v1 Hadoop v2
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved. 40
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Yarn
!   MRV1 did
!   Resource Management
!   And Processing
!   Separate both out
!   Yarn for resource management
!   Mapreduce / other frameworks for processing
!   Now mapreduce is ‘just another app’
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Yarn Architecture
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
YARN Architecture
!   resource manager : manages the resource for entire
cluster
!   node manager : manages resources a single node
!   Containers : resource buckets ( 2 cpu + 8 G RAM)
!   application masters : one for each application
!   batch mapreduce, storm …etc
!   Manages application scheduling and execution
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Adoption of YARN
!   Standard on Hadoop v2
!   Already running at Yahoo at scale
!   Lot of applications are already moving to YARN
architecture
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Apps on Yarn
HDFS
YARN
Batch
(mapreduce)
Streaming
(storm, S4)
In-memory
(spark)
Graph
(giraph)
realtime
(hbase)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Apps on YARN
!   Storm : real time event processing
!   Giraph : graph processing (in memory)
!   Spark : in-memory, iterative processing
!   Hbase
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce on YARN
!   MapReduce is NOT going anywhere
!   Works very well for batch processing
!   Proven
!   Lots of code out there
!   No more single JobTracker
!   Each MapReduce job runs an Application
!   So failure one AppMaster only causes that job to fail
!   Other jobs are insulated
!   Better performance
!   MR jobs scale / utilize cluster better in Yarn (1.5 x – 2x )
(c) ElephantScale.com, 2014
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
MapReduce on YARN
(c) ElephantScale.com, 2014
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Writing A YARN Application
! http://hadoop.apache.org/docs/stable/hadoop-yarn/
hadoop-yarn-site/WritingYarnApplications.html
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
So Which Hadoop Should I Use?
!   If you are starting now…
!   Hadoop 2
!   Already using Hadoop 1
!   Worth the upgrade (new features / performance)
!   How do I migrate?
!   Recommended : Standup a separate v2 cluster and migrate data
over
!   In place update? (yeek!)
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Hadoop Distributions
Distribution Hadoop v1 Hadoop v2
Cloudera CDH 3.x / CDH 4.x CDH 5.x
Horton Works HDP 1.x HDP 2.x
Pivotal HD
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Future…
!   HDFS
!   Mirroring across data centers
!   Work well with SSD (solid state drives / flash drives)
!   YARN
!   Better containers (not just JVMs)
!   Performance
!   Make Resource Manager HA
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Thanks & Questions?
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Attribution & Feedback
54
Please send any questions or comments regarding this SNIA
Tutorial to tracktutorials@snia.org
The SNIA Education Committee thanks the following
individuals for their contributions to this Tutorial.
Authorship History
Sujee Maniyam (Sept 2014)
Additional Contributors
Joseph White : Review & Feedback
Hadoop 2 : New and Noteworthy
© 2013 Storage Networking Industry Association. All Rights Reserved.
Backup Slides
55

Weitere ähnliche Inhalte

Was ist angesagt?

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopDataWorks Summit
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 

Was ist angesagt? (20)

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 

Andere mochten auch

Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and HadoopCynthia Saracco
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Cynthia Saracco
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab Cynthia Saracco
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Andere mochten auch (10)

Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Ähnlich wie Hadoop2 new and noteworthy SNIA conf

field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Pramod Gosavi
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product pageJanu Jahnavi
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 

Ähnlich wie Hadoop2 new and noteworthy SNIA conf (20)

field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
big data
big databig data
big data
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 

Mehr von Sujee Maniyam

Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 
Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Sujee Maniyam
 
Launching your career in Big Data
Launching your career in Big DataLaunching your career in Big Data
Launching your career in Big DataSujee Maniyam
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscapeSujee Maniyam
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Sujee Maniyam
 

Mehr von Sujee Maniyam (8)

Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014Building secure NoSQL applications nosqlnow_conf_2014
Building secure NoSQL applications nosqlnow_conf_2014
 
Launching your career in Big Data
Launching your career in Big DataLaunching your career in Big Data
Launching your career in Big Data
 
Hadoop security landscape
Hadoop security landscapeHadoop security landscape
Hadoop security landscape
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Hadoop2 new and noteworthy SNIA conf

  • 1. PRESENTATION TITLE GOES HEREHadoop 2 : New and Noteworthy Sujee Maniyam, ElephantScale sujee@ElephantScale.com http://ElephantScale.com
  • 2. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. SNIA Legal Notice !   The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. !   Member companies and individual members may use this material in presentations and literature under the following conditions: !   Any slide or slides used must be reproduced in their entirety without modification !   The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. !   This presentation is a project of the SNIA Education Committee. !   Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. !   The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2
  • 3. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Abstract !   Hadoop 2 : New And Noteworthy Features !   This session will appeal to Data Center Managers, Development Managers, and those that are looking for an overview of ‘whats new’ in Hadoop 2 platform. The session will highlight some of the notable features in Hadoop 2. 3
  • 4. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Quick Poll !   How many of you are NEW to Hadoop? !   How many of you are USING Hadoop?
  • 5. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Timeline
  • 6. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Versions – J
  • 7. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Versions – Simplified Hadoop 1 Hadooop 2 1.2.1 (aug 2013) 2.2.0 : (oct 2013)
  • 8. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Feature Matrix Component Feature V1 v2 HDFS NameNode High Availability X Namenode federation X Snapshots X NFS v3 access to HDFS X Improved IO X Processing MapReduce v1 X YARN (MapReduce v2) X Other Kerberos security X X
  • 9. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 9
  • 10. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Architecture (V1) 10 Name Node Data Node Data NodeData NodeData Node
  • 11. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Name Node High Availability !   HDFS has (had) a ONE NameNode/ many Datanode design !   This leads to ‘Single Point of Failure’ (SPOF) for Name Node
  • 12. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode Is Very Important In A Cluster 12
  • 13. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Is Hadoop NN Failure A Big Deal? !   At Yahoo study !   18 month study !   22 failure on 25 clusters !   0.58 failures per cluster per year !   Only half of them would have benefited from HA !   à 0.23 failure / year / cluster ! http://www.slideshare.net/Hadoop_Summit/hdfs- namenode-high-availability
  • 14. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Still Needs To Be Fixed !   Downtime may be acceptable for batch workloads !   But not acceptable for running real time workloads like HBase that depend on HDFS !   Downtime (even minutes) is not acceptable !   Make Hadoop more Enterprise friendly
  • 15. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. How Do We Fix A Single NameNode Failure? !   Have two Namenodes ! !   One ACTIVE and another PASSIVE !   When Active NN fails, Passive one will take over !   Fail over can be automated
  • 16. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Architecture (v1) 16 Name Node Data Node Data NodeData NodeData Node
  • 17. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA (V2) 17 Name Node 1 (active) Data Node Data NodeData NodeData Node Name Node 2 (passive) Shared storage
  • 18. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA : Shared Storage (c) ElephantScale.com, 2014 18 Name Node 1 (active) Data Node Data NodeData NodeData Node Name Node 2 (passive) Filer Option 1) external filer Option 2) Quorum Journal
  • 19. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Namenode HA !   Namenode meta data is written to a shared storage (external filer or Quorum Journal Manager) !   Only ONE active NN can write to shared storage !   Passive NN reads and replays meta data from shared storage !   When Active NN fails, passive NN is promoted to active !   Can be manual or automatic
  • 20. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NameNode HA Setup 20
  • 21. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 21
  • 22. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Namenode Federation !   Namenode stores meta data in memory !   For large (very large) clusters, NN could exhaust memory !   Spread meta-data over mulitiple namenodes
  • 23. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation
  • 24. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation !   Now the namespace is divided !   /hbase à NN1 !   /user à NN2 !   /hive à NN3
  • 25. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Federation !   Namespace is partitioned into ‘block pools’ !   Datanodes are shared across cluster !   They store blocks for different pools !   Datanodes send heart-beats to all NNs
  • 26. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 26
  • 27. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Snapshots !   Wait, doesn’t HDFS makes replicas? !   Yes !   But it doesn’t save you from : hdfs dfs –rm –r /data !   ‘Trash’ feature only works for CLI utilities !   You can delete files using API.. Poof gone
  • 28. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Snapshots !   Recover from user errors, other disasters !   Peroidic snapshots !   E.g : daily backups… keep them for 15 days !   Snapshotting is !   Efficient (no data duplication, copy on write) !   Fast !   snapshot part of file system (not the whole thing) ! http://cdn.oreillystatic.com/en/assets/1/event/100/HDFS %20Snapshots%20and%20Beyond%20Presentation.pdf
  • 29. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 29
  • 30. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NFS Access to HDFS !   HDFS is a userland file system !   Not a kernel file system !   So most linux programs can not read/write data to HDFS !   We use ‘hdfs’ command line utils
  • 31. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NFS Access to HDFS !   HDFS supports NFS protocol starting with v2 !   NFS is done via gateway machine
  • 32. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. NEXT !   NameNode High Availability !   Federation !   Snapshots !   NFS !   Improved IO 32
  • 33. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. HDFS Improved IO !   Lots of performance fixes from v1 à v2 !   Quick comparison !   Multi threaded random-read !   HDFS v1 : 264 MB/sec !   HDFS v2 : 1395 MB /sec ( 5x !) Source : http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache- hadoop-forum
  • 34. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. V2 Features ! HDFS !   Processing !   YARN
  • 35. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce V1 !   MRV1 proved itself as a reliable batch processing framework! !   One Job Tracker (master) and many task tracker (workers)
  • 36. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce Architecture 36 Job Tracker Task Tracker Task TrackerTask TrackerTask Tracker
  • 37. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MRV1 Limitations !   Only supports one programming paradigm !   Batch processing !   Alternate processing is hard to (or not possible) implement on top of MRV1 !   Real time processing !   In-memory data
  • 38. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MRV1 Limitations !   Single Job Tracker (JT) à single point of failure !   JT Failure kills all running jobs (and queued jobs) !   JT started hit scalability limitations for very large clusters !   4,000 nodes
  • 39. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Looking Ahead HDFS MRV1 1) Processing 2) Resource management HDFS YARN (resource management) mapreduce other Hadoop v1 Hadoop v2
  • 40. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. 40
  • 41. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Yarn !   MRV1 did !   Resource Management !   And Processing !   Separate both out !   Yarn for resource management !   Mapreduce / other frameworks for processing !   Now mapreduce is ‘just another app’
  • 42. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Yarn Architecture
  • 43. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. YARN Architecture !   resource manager : manages the resource for entire cluster !   node manager : manages resources a single node !   Containers : resource buckets ( 2 cpu + 8 G RAM) !   application masters : one for each application !   batch mapreduce, storm …etc !   Manages application scheduling and execution
  • 44. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Adoption of YARN !   Standard on Hadoop v2 !   Already running at Yahoo at scale !   Lot of applications are already moving to YARN architecture
  • 45. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Apps on Yarn HDFS YARN Batch (mapreduce) Streaming (storm, S4) In-memory (spark) Graph (giraph) realtime (hbase)
  • 46. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Apps on YARN !   Storm : real time event processing !   Giraph : graph processing (in memory) !   Spark : in-memory, iterative processing !   Hbase
  • 47. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce on YARN !   MapReduce is NOT going anywhere !   Works very well for batch processing !   Proven !   Lots of code out there !   No more single JobTracker !   Each MapReduce job runs an Application !   So failure one AppMaster only causes that job to fail !   Other jobs are insulated !   Better performance !   MR jobs scale / utilize cluster better in Yarn (1.5 x – 2x ) (c) ElephantScale.com, 2014
  • 48. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. MapReduce on YARN (c) ElephantScale.com, 2014
  • 49. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Writing A YARN Application ! http://hadoop.apache.org/docs/stable/hadoop-yarn/ hadoop-yarn-site/WritingYarnApplications.html
  • 50. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. So Which Hadoop Should I Use? !   If you are starting now… !   Hadoop 2 !   Already using Hadoop 1 !   Worth the upgrade (new features / performance) !   How do I migrate? !   Recommended : Standup a separate v2 cluster and migrate data over !   In place update? (yeek!)
  • 51. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Hadoop Distributions Distribution Hadoop v1 Hadoop v2 Cloudera CDH 3.x / CDH 4.x CDH 5.x Horton Works HDP 1.x HDP 2.x Pivotal HD
  • 52. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Future… !   HDFS !   Mirroring across data centers !   Work well with SSD (solid state drives / flash drives) !   YARN !   Better containers (not just JVMs) !   Performance !   Make Resource Manager HA
  • 53. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Thanks & Questions?
  • 54. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Attribution & Feedback 54 Please send any questions or comments regarding this SNIA Tutorial to tracktutorials@snia.org The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial. Authorship History Sujee Maniyam (Sept 2014) Additional Contributors Joseph White : Review & Feedback
  • 55. Hadoop 2 : New and Noteworthy © 2013 Storage Networking Industry Association. All Rights Reserved. Backup Slides 55