SlideShare a Scribd company logo
1 of 50
Download to read offline
Apache Hadoop
Foundations of Scalability
Konstantin V. Shvachko
November, 2013
WANdisco, Chief Architect
- NonStop Hadoop

- Free Training

Founder of AltoStor and AltoScale
Hadoop, HDFS at Yahoo! & eBay. Since 2005

Data structures and algorithms for large-scale distributed
storage systems
Apache Hadoop committer and member of PMC

The domain of computing is changing so does the computing itself
History of computing started long time ago
Fascination with numbers

Vast universe with simple strict rules


Computing devices


Crunch numbers

The Internet

Universe of words, fuzzy rules


Different type of computing


Understand meaning of things


Human thinking


Errors & deviations are a
part of study

Computer History Museum, San Jose

Words vs. Numbers
From Big Numbers to Big Data
In 1997 IBM built Deep Blue

Playing chess game with the
champion G. Kasparov


Human race defeated


Strict rules for Chess


Fast deep analyses of current state

In 2011 IBM built Watson computer to
play Jeopardy

Questions and hints in human terms


Natural language processing


Reborn as diagnostics machine:

Big Data
Computations that need the power of many computers

Large datasets: hundreds of TBs, PBs


Or use of thousands of CPUs in parallel


Or both

Cluster as a computer

What is a PB?
1 KB = 1000 Bytes
1 MB = 1000 KB
1 GB = 1000 MB
1 TB = 1000 GB
1 PB = 1000 TB
???? = 1000 PB

Examples – Science
Fundamental physics: Large Hadron Collider (LHC)

Smashing high-energy protons at the speed of light


1 PB of event data per sec, most filtered out


15 PB of data per year


160 PB of disk + 90 PB of tape storage

Math: Big Numbers

2 quadrillionth (1015) digit of π is 0


Pure CPU workload: 12 days of cluster time


208 years of CPU-time on a cluster with 7600 CPU cores


Patient records, Sensors, Drug design


Examples – Web
Search engine

Webmap - Map of the Internet


2008 @ Yahoo, 1500 nodes, 5 PB raw storage


Internet Search Index


Traditional Big Data applications

Behavioural Analysis

Recommendation engine: You may buy this too


Intelligence: fraud detection


Sentiment analysis: who will win elections


Matching interests: you should like him / her

The Sorting Problem
Turns into a Big Data problem as the data set grows
Classic in-memory sorting

Complexity: number of comparisons



Bubble Sort






O(n log n)


Merge Sort

O(n log n)

O(n log n)


External sorting

Cannot load all data in memory


16 GB RAM vs. 200 GB file


Complexity: + disk IOs (bytes read or written)

Distributed sorting

Cannot load data on a single server


12 drives * 2 TB = 24 TB disc space vs. 200 TB data set


Complexity: + network transfers

Need a lot of computers
How to make them work together
A reliable, scalable, high performance distributed computing system
Apache Hadoop is an ecosystem of tools for  processing  “Big  Data”

Started in 2005 by D. Cutting and M. Cafarella


Scaled by Yahoo! Hadoop team from few nodes to thousands (4K-node cluster)

Consists of two main components: Providing unified cluster view
1. HDFS – a distributed file system

File system API connecting thousands of drives

2. MapReduce – a framework for distributed computations

Splitting jobs into parts executable on one node


Scheduling and monitoring of job execution

Today used everywhere: Becoming a standard of distributed computing
Hadoop is an open source project

Hadoop: Architecture Principles
Linear scalability: more nodes can do more work within the same time

Linear on data size:


Linear on compute resources:

Move computation to data

Minimize expensive data transfers


Data are large, programs are small

Reliability and Availability: Commodity hardware

1 drive fails every 3 years => Probability of failing today 1/1000


How many drives per day fail on 1000 node cluster with 10 drives per node?

Sequential data processing: avoid random reads / writes

Simple computational model

hides complexity in efficient execution framework

The Hadoop Family
Ecosystem of tools for processing BigData

YARN, MapReduce

Distributed file system
Computational Framework


Distributed coordination


Key-Value store


Dataflow language, SQL


Data warehouse, SQL


Complex job workflow


Packaging and testing
Distributed Computation

2004 Jeffrey Dean, Sanjay Ghemawat. Google.


“MapReduce:  Simplified  Data  Processing  on  Large  Clusters”

Parallel Computational Model

Examples of computational models
• Turing or Post machines. Programming languages – C++, Java
• Finite automaton, lambda calculus


Split large input data into small enough pieces, process in parallel

Distributed Execution Framework

Compilers, interpreters


Scheduling, Processing, Coordination


Failure recovery

Functional Programming
Map a higher-order function

applies a given function to each element of a list


returns the list of results

Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ]
Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25]

Functional Programming: reduce
Map a higher-order function

applies a given function to each element of a list


returns the list of results

Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ]
Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25]
Reduce / fold a higher-order function

Iterates given function over a list of elements


Applies function to previous result and current element


Return single result

Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15

Functional Programming
Map a higher-order function

applies a given function to each element of a list


returns the list of results

Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ]
Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25]
Reduce / fold a higher-order function

Iterates given function over a list of elements


Applies function to previous result and current element


Return single result

Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15
Reduce( x * y, [0,1,2,3,4,5] ) =


Functional Programming
Map a higher-order function

applies a given function to each element of a list


returns the list of results

Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ]
Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25]
Reduce / fold a higher-order function

Iterates given function over a list of elements


Applies function to previous result and current element


Return single result

Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15
Reduce( x * y, [0,1,2,3,4,5] ) =


Example: Sum of Squares
Composition of

a map followed by


a reduce applied to the results of the map

Square Pyramid Number
1  +  4  +  …  +  n2 =
n(n+1)(2n+1) / 6


Map( x2, [1,2,3,4,5] ) = [0,1,4,9,16,25]


Reduce( x + y, [1,4,9,16,25] ) = ((((1 + 4) + 9) + 16) + 25) = 55

Map easily parallelizable

Compute x2 for 1,2,3 on one node and for 4,5 on another

Reduce notoriously sequential

Need all squares at one node to compute the total sum.

Computational Model
Map-Reduce is a Parallel Computational Model
Map-Reduce algorithm = job

Operates with key-value pairs: (k, V)

Primitive types, Strings or more complex Structures

Map-Reduce job input and output are collections of pairs {(k, V)}

MR Job is defined by 2 functions
map:  (k1;;  v1)  →  {(k2;;  v2)}
reduce:  (k2;;  {v2})  →  {(k3;;  v3)}

Job Workflow


C, 3

V, 1
C, 8

C, 2

V, 2

V, 4

C, 3

V, 1

The Algorithm

Map ( null, word)
nC = Consonants(word)
nV = Vowels(word)
Emit(“Consonants”,  nC)
Emit(“Vowels”,  nV)
Reduce(key,  {n1,  n2,  …})
nRes =  n1  +  n2  +  …
Emit(key, nRes)

Computation Framework
Job is executed on a cluster of computers
Two virtual clusters: HDFS and MapReduce

Physically tightly coupled


Designed to work together

The Hadoop Distributed File System


Reliable storage layer



View data as files and directories

MapReduce as Computation Framework

Job scheduling


Resource management


Lifecycle coordination


Task execution module







HDFS Architecture Principles
The name space is a hierarchy of files and directories
Files are divided into blocks (typically 128 MB)

Namespace (metadata) is decoupled from data

Fast namespace operations, not slowed down by


Data streaming

Single NameNode keeps the entire name space in RAM
DataNodes store data blocks on local drives
Blocks are replicated on 3 DataNodes for redundancy and availability

MapReduce Framework
Job Input is a file or a set of files in a distributed file system (HDFS)

Input is split into blocks of roughly the same size


Blocks are replicated to multiple nodes


Block holds a list of key-value pairs

Map task is scheduled to one of the nodes containing the block

Map task input is node-local


Map task result is node-local

Map task results are grouped: one group per reducer
Each group is sorted
Reduce task is scheduled to a node

Reduce task transfers the targeted groups from all mapper nodes


Computes and stores results in a separate HDFS file

Job Output is a set of files in HDFS. With #files = #reducers

Map Reduce Example: Mean





Input: large text file
Output: average length of words in the file µ

Example: µ({dogs, like, cats}) = 4

Mean Mapper
Map input is the set of words {w} in the partition

Key = null

Value = w

Map computes

Number of words in the partition


Total length of the words


Map output

<“count”,  #words>


<“length”,  #totalLength>

Map (null, w)
Emit(“count”,  1)  
Emit(“length”,  length(w))

Single Mean Reducer
Reduce input

{<key, {value}>}, where


key  =  “count”,  “length”


value is an integer

Reduce computes

Total number of words:


Total length of words: L  =  sum  of  all  “length”  values

Reduce Output

<“count”,  N>


<“length”,  L>

The result


N  =  sum  of  all  “count”  values

Reduce(key,  {n1,  n2,  …})
nRes =  n1  +  n2  +  …
Emit(key, nRes)
Analyze ()
print(“mean = ”  +  L/N)

MapReduce Implementation
Single master JobTracker shepherds the distributed heard of TaskTrackers
1. Job scheduling and resource allocation

2. Job monitoring and job lifecycle coordination
3. Cluster health and resource tracking

Job is defined

Program: myJob.jar file


Configuration: job.xml


Input, output paths

JobClient submits the job to the JobTracker

Calculates and creates splits based on the input


Write myJob.jar and job.xml to HDFS

MapReduce Implementation
JobTracker divides the job into tasks: one map task per split.

Assigns a TaskTracker for each task, collocated with the split

TaskTrackers execute tasks and report status to the JobTracker

TaskTracker can run multiple map and reduce tasks


Map and Reduce Slots

Failed attempts reassigned to other TaskTrackers

Job execution status and results reported back to the client
Scheduler lets many jobs run in parallel

Example: Standard Deviation

Standard deviation


( xi



Input: large text file
Output:  standard  deviation  σ  of  word  lengths

Example: σ({dogs, like, cats}) = 0

How many jobs

Standard Deviation: Hint






( xi









2 (


xi )






Standard Deviation Mapper
Map input is the set of words {w} in the partition

Key = null

Value = w

Map computes

Number of words in the partition


Total  length  of  the  words  ∑length(w)


The  sum  of  lengths  squared  ∑length(w)2

Map output

<“count”,  #words>


<“length”,  #totalLength>


<“squared”,  #sumLengthSquared>

Map (null, w)
Emit(“count”,  1)  
Emit(“length”,  length(w))
Emit(“squared”,  length(w)2)

Standard Deviation Reducer
Reduce input

{<key, {value}>}, where


key  =  “count”,  “length”,  “squared”


value is an integer

Reduce(key,  {n1,  n2,  …})
nRes =  n1  +  n2  +  …
Emit(key, nRes)

Reduce computes

Total number of words:

N  =  sum  of  all  “count”  values


Total length of words:

L  =  sum  of  all  “length”  values


Sum of length squares: S  =  sum  of  all  “squared”  values

Reduce Output

<“count”,  N>


<“length”,  L>


<“squared”,  S>

The result



Analyze ()
print(“mean = ”  +  L/N)
print(“  = ”  +  
sqrt(S/N – L*L / N*N))

σ  =  sqrt(S / N - µ2)

Combiner, Partitioner
Combiners perform local aggregation before the shuffle & sort phase

Optimization to reduce data transfers during shuffle


In Mean example reduces transfer of many keys to only two

Partitioners assign intermediate (map) key-value pairs to reducers

Responsible for dividing up the intermediate key space


Not used with single Reducer




Combiner Shuffle
Partitioner & sort





Distributed Sorting
Sort a dataset, which cannot be entirely stored on one node.

Set of files. 100 byte records.


The first 10 bytes of each record is the key and the rest is the value.


Ordered list of files: f1,  …  fN


Each file fi is sorted, and


If i < j then for any keys k Є fi and r Є fj (k  ≤  r)


Concatenation of files in the given order must form a completely sorted record set

Naïve MapReduce Sorting
If the output could be stored on one node
The input to any Reducer is always sorted by key

Shuffle sorts Map outputs

One identity Mapper and one identity Reducer would do the trick

Identity: <k,v>  →  <k,v>















Sorting with Multiple Maps
Multiple identity Mappers and one identity Reducer – same result

Does not work for multiple Reducers













Sorting: Generalization
Define a hash function, such that

h:  {k}  →  [1,N]


Preserves  the  order:  k  ≤  s    →    h(k)  ≤  h(s)


h(k) is a fixed size prefix of string k (2 first bytes)

Identity Mapper
With a specialized Partitioner

Compute hash of the key h(k) and assigns <k,v> to reducer Rh(k)

Identity Reducer

Number of reducers is N: R1,  …,  RN


Inputs for Ri are all pairs that have key h(k) = i


Ri is an identity reducer, which writes output to HDFS file fi


Hash function choice guarantees that
keys from fi are less than keys from fj if i < j

The algorithm was implemented to win Gray’s Terasort Benchmark in 2008

Scalability Challenges
Single NameNode of HDFS
Why High Availability is Important?
Scheduled downtime dominates Unscheduled
- OS maintenance

- Configuration changes
Reasons for Unscheduled Downtime
- 60 incidents in 500 days on 30,000 nodes
- 24 Full GC – the majority

- System bugs / Bad application / Insufficient resources
- “Data  Availability  and  Durability  with  HDFS”
Lack of Availability due to Performance Problems
- A handful of nodes can saturate NameNode

Hadoop-2 Active-Standby Architecture
Provides failover to a Standby when Active Node fails
Single Active NameNode shares journal with StandbyNode via
shared storage:


WANdisco Active-Active Architecture
Fully replicated NameNodes available for reads and writes
Multiple equal-role NameNodes share namespace state via
Coordination Engine


WANdisco: Scaling Across Data Centers
Continuous availability, and Disaster Recovery over a WAN
Wide Area Network replication
Metadata – online

Data – offline

What is Apache HBase
A distributed key-value store for real-time access to semi-structured data
Table: big, sparse, loosely structured
Collection of rows, sorted by row keys


Rows can have arbitrary number of columns



Table is split Horizontally into Regions



Dynamic Table partitioning



Region Servers serve regions to applications

Columns grouped into Column families

Vertical partition of tables









Distributed Cache:

Regions are  loaded  in  nodes’  RAM


Real-time access to data

HBase Challenge
Failure of a region requires failover

Regions reassigned to other Region Servers


Clients failover and reconnect to new servers

Regions in high demand

Many client connections to one server introduce bottleneck

Good idea to replicate popular regions on multiple Region Servers

Open Problem: consistent updates

Solution: Coordinated updates

Giraffa File System
A distributed highly scalable file system using HDFS and HBase
RAM - namespace size limitation

Giraffa is a distributed,
highly available file system

Utilizes features of
HDFS and HBase

New open source project
in experimental stage

Giraffa Requirements
Availability – the primary goal

Load balancing of metadata traffic
Same data streaming speed to / from DataNodes


Continuous Availability: No SPOF

Cluster operability, management

Cost of running larger clusters same as a smaller one

More files & more data


Federated HDFS



25 PB

120 PB

1 EB = 1000 PB

Files + blocks

200 million

1 billion

100 billion

Concurrent Clients



1 million

Giraffa Architecture
Namespace Service





Giraffa client
gets files and
blocks from


Namespace Table
path, attrs, block[], DN[][]

handles block


Stream data
to or from

Block Management Processor

Block Management Layer







Thank you

Contact: Samantha Leggat | t: 925.396.1194 |
WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583

More Related Content

What's hot

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
Seminar_Report_hadoopVarun Narang
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFSBrendan Tierney
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh

What's hot (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
Unit 1
Unit 1Unit 1
Unit 1
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem

Viewers also liked

MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityRussell Spitzer
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit Seidel
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit

Viewers also liked (10)

MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Screw thread measurements and Gear measurement
Screw thread measurements and Gear measurementScrew thread measurements and Gear measurement
Screw thread measurements and Gear measurement

Similar to Apache Hadoop: Foundations of Scalability

Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingCollin Bennett
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introductionrajsandhu1989
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseCloudera, Inc.
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko

Similar to Apache Hadoop: Foundations of Scalability (20)

Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive Computing
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Big data concepts
Big data conceptsBig data concepts
Big data concepts

More from WANdisco Plc

Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentWANdisco Plc
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & MergingWANdisco Plc
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVNWANdisco Plc
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for DevelopmentWANdisco Plc
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdiscoWANdisco Plc
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Plc
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion AgileWANdisco Plc
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and BeyondWANdisco Plc
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...WANdisco Plc
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionWANdisco Plc

More from WANdisco Plc (13)

Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed Development
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdisco
Subversion Zen
Subversion ZenSubversion Zen
Subversion Zen
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support Services
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion Agile
Why Svn
Why SvnWhy Svn
Why Svn
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and Beyond
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using Subversion

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx

Apache Hadoop: Foundations of Scalability

  • 1. Apache Hadoop Foundations of Scalability Konstantin V. Shvachko November, 2013
  • 2. Author WANdisco, Chief Architect - NonStop Hadoop - Free Training Founder of AltoStor and AltoScale Hadoop, HDFS at Yahoo! & eBay. Since 2005 Data structures and algorithms for large-scale distributed storage systems Apache Hadoop committer and member of PMC 2
  • 3. Computing The domain of computing is changing so does the computing itself History of computing started long time ago Fascination with numbers - Vast universe with simple strict rules - Computing devices - Crunch numbers The Internet - Universe of words, fuzzy rules - Different type of computing - Understand meaning of things - Human thinking - Errors & deviations are a part of study Computer History Museum, San Jose 3
  • 4. Words vs. Numbers From Big Numbers to Big Data In 1997 IBM built Deep Blue supercomputer - Playing chess game with the champion G. Kasparov - Human race defeated - Strict rules for Chess - Fast deep analyses of current state In 2011 IBM built Watson computer to play Jeopardy - Questions and hints in human terms - Natural language processing - Reborn as diagnostics machine: Oncology. 4
  • 5. Big Data Computations that need the power of many computers - Large datasets: hundreds of TBs, PBs - Or use of thousands of CPUs in parallel - Or both Cluster as a computer What is a PB? 1 KB = 1000 Bytes 1 MB = 1000 KB 1 GB = 1000 MB 1 TB = 1000 GB 1 PB = 1000 TB ???? = 1000 PB 5
  • 6. Examples – Science Fundamental physics: Large Hadron Collider (LHC) - Smashing high-energy protons at the speed of light - 1 PB of event data per sec, most filtered out - 15 PB of data per year - 160 PB of disk + 90 PB of tape storage Math: Big Numbers - 2 quadrillionth (1015) digit of π is 0 - Pure CPU workload: 12 days of cluster time - 208 years of CPU-time on a cluster with 7600 CPU cores Healthcare - Patient records, Sensors, Drug design - Genome
  • 7. Examples – Web Search engine - Webmap - Map of the Internet - 2008 @ Yahoo, 1500 nodes, 5 PB raw storage - Internet Search Index - Traditional Big Data applications Behavioural Analysis - Recommendation engine: You may buy this too - Intelligence: fraud detection - Sentiment analysis: who will win elections - Matching interests: you should like him / her 7
  • 8. The Sorting Problem Turns into a Big Data problem as the data set grows Classic in-memory sorting - Complexity: number of comparisons Worst Average Space Bubble Sort O(n2) O(n2) In-place Quicksort O(n2) O(n log n) In-place Merge Sort O(n log n) O(n log n) Double External sorting - Cannot load all data in memory - 16 GB RAM vs. 200 GB file - Complexity: + disk IOs (bytes read or written) Distributed sorting - Cannot load data on a single server - 12 drives * 2 TB = 24 TB disc space vs. 200 TB data set - Complexity: + network transfers 8
  • 9. Hadoop Need a lot of computers How to make them work together
  • 10. Hadoop A reliable, scalable, high performance distributed computing system Apache Hadoop is an ecosystem of tools for  processing  “Big  Data” - Started in 2005 by D. Cutting and M. Cafarella - Scaled by Yahoo! Hadoop team from few nodes to thousands (4K-node cluster) Consists of two main components: Providing unified cluster view 1. HDFS – a distributed file system • File system API connecting thousands of drives 2. MapReduce – a framework for distributed computations • Splitting jobs into parts executable on one node • Scheduling and monitoring of job execution Today used everywhere: Becoming a standard of distributed computing Hadoop is an open source project 10
  • 11. Hadoop: Architecture Principles Linear scalability: more nodes can do more work within the same time - Linear on data size: - Linear on compute resources: Move computation to data - Minimize expensive data transfers - Data are large, programs are small Reliability and Availability: Commodity hardware - 1 drive fails every 3 years => Probability of failing today 1/1000 - How many drives per day fail on 1000 node cluster with 10 drives per node? Sequential data processing: avoid random reads / writes Simple computational model - hides complexity in efficient execution framework 11
  • 12. The Hadoop Family Ecosystem of tools for processing BigData HDFS YARN, MapReduce Distributed file system Computational Framework Zookeeper Distributed coordination HBase Key-Value store Pig Dataflow language, SQL Hive Data warehouse, SQL Oozie Complex job workflow BigTop Packaging and testing 12
  • 14. MapReduce MapReduce - 2004 Jeffrey Dean, Sanjay Ghemawat. Google. - “MapReduce:  Simplified  Data  Processing  on  Large  Clusters” Parallel Computational Model - Examples of computational models • Turing or Post machines. Programming languages – C++, Java • Finite automaton, lambda calculus - Split large input data into small enough pieces, process in parallel Distributed Execution Framework - Compilers, interpreters - Scheduling, Processing, Coordination - Failure recovery 14
  • 15. Functional Programming Map a higher-order function - applies a given function to each element of a list - returns the list of results Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ] Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] 15
  • 16. Functional Programming: reduce Map a higher-order function - applies a given function to each element of a list - returns the list of results Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ] Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] Reduce / fold a higher-order function - Iterates given function over a list of elements - Applies function to previous result and current element - Return single result Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 16
  • 17. Functional Programming Map a higher-order function - applies a given function to each element of a list - returns the list of results Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ] Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] Reduce / fold a higher-order function - Iterates given function over a list of elements - Applies function to previous result and current element - Return single result Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 Reduce( x * y, [0,1,2,3,4,5] ) = ? 17
  • 18. Functional Programming Map a higher-order function - applies a given function to each element of a list - returns the list of results Map( f(x), X[1:n] ) ->  [  f(X[1]),  …,  f(X[n])  ] Example. Map( x2, [0,1,2,3,4,5] ) = [0,1,4,9,16,25] Reduce / fold a higher-order function - Iterates given function over a list of elements - Applies function to previous result and current element - Return single result Example. Reduce( x + y, [0,1,2,3,4,5] ) = (((((0 + 1) + 2) + 3) + 4) + 5) = 15 Reduce( x * y, [0,1,2,3,4,5] ) = 0 18
  • 19. Example: Sum of Squares Composition of - a map followed by - a reduce applied to the results of the map Square Pyramid Number 1  +  4  +  …  +  n2 = n(n+1)(2n+1) / 6 Example. - Map( x2, [1,2,3,4,5] ) = [0,1,4,9,16,25] - Reduce( x + y, [1,4,9,16,25] ) = ((((1 + 4) + 9) + 16) + 25) = 55 Map easily parallelizable - Compute x2 for 1,2,3 on one node and for 4,5 on another Reduce notoriously sequential - Need all squares at one node to compute the total sum. 19
  • 20. Computational Model Map-Reduce is a Parallel Computational Model Map-Reduce algorithm = job Operates with key-value pairs: (k, V) - Primitive types, Strings or more complex Structures Map-Reduce job input and output are collections of pairs {(k, V)} MR Job is defined by 2 functions map:  (k1;;  v1)  →  {(k2;;  v2)} reduce:  (k2;;  {v2})  →  {(k3;;  v3)} 20
  • 21. Job Workflow dogs C, 3 V, 1 C, 8 like C, 2 V, 2 V, 4 cats C, 3 V, 1 21
  • 22. The Algorithm Map ( null, word) nC = Consonants(word) nV = Vowels(word) Emit(“Consonants”,  nC) Emit(“Vowels”,  nV) Reduce(key,  {n1,  n2,  …}) nRes =  n1  +  n2  +  … Emit(key, nRes) 22
  • 23. Computation Framework Job is executed on a cluster of computers Two virtual clusters: HDFS and MapReduce - Physically tightly coupled - Designed to work together The Hadoop Distributed File System - JobTracker Reliable storage layer - NameNode View data as files and directories MapReduce as Computation Framework - Job scheduling - Resource management - Lifecycle coordination - Task execution module TaskTracker TaskTracker TaskTracker Task DataNode DataNode DataNode Block 23
  • 24. HDFS Architecture Principles The name space is a hierarchy of files and directories Files are divided into blocks (typically 128 MB) Namespace (metadata) is decoupled from data - Fast namespace operations, not slowed down by - Data streaming Single NameNode keeps the entire name space in RAM DataNodes store data blocks on local drives Blocks are replicated on 3 DataNodes for redundancy and availability 24
  • 25. MapReduce Framework Job Input is a file or a set of files in a distributed file system (HDFS) - Input is split into blocks of roughly the same size - Blocks are replicated to multiple nodes - Block holds a list of key-value pairs Map task is scheduled to one of the nodes containing the block - Map task input is node-local - Map task result is node-local Map task results are grouped: one group per reducer Each group is sorted Reduce task is scheduled to a node - Reduce task transfers the targeted groups from all mapper nodes - Computes and stores results in a separate HDFS file Job Output is a set of files in HDFS. With #files = #reducers 25
  • 26. Map Reduce Example: Mean Mean 1 n n xi 1 Input: large text file Output: average length of words in the file µ Example: µ({dogs, like, cats}) = 4 26
  • 27. Mean Mapper Map input is the set of words {w} in the partition - Key = null Value = w Map computes - Number of words in the partition - Total length of the words ∑length(w) Map output - <“count”,  #words> - <“length”,  #totalLength> Map (null, w) Emit(“count”,  1)   Emit(“length”,  length(w)) 27
  • 28. Single Mean Reducer Reduce input - {<key, {value}>}, where - key  =  “count”,  “length” - value is an integer Reduce computes - Total number of words: - Total length of words: L  =  sum  of  all  “length”  values Reduce Output - <“count”,  N> - <“length”,  L> The result - µ=L/N N  =  sum  of  all  “count”  values Reduce(key,  {n1,  n2,  …}) nRes =  n1  +  n2  +  … Emit(key, nRes) Analyze () read(“part-r-00000”) print(“mean = ”  +  L/N) 28
  • 29. MapReduce Implementation Single master JobTracker shepherds the distributed heard of TaskTrackers 1. Job scheduling and resource allocation 2. Job monitoring and job lifecycle coordination 3. Cluster health and resource tracking Job is defined - Program: myJob.jar file - Configuration: job.xml - Input, output paths JobClient submits the job to the JobTracker - Calculates and creates splits based on the input - Write myJob.jar and job.xml to HDFS 29
  • 30. MapReduce Implementation JobTracker divides the job into tasks: one map task per split. - Assigns a TaskTracker for each task, collocated with the split TaskTrackers execute tasks and report status to the JobTracker - TaskTracker can run multiple map and reduce tasks - Map and Reduce Slots Failed attempts reassigned to other TaskTrackers Job execution status and results reported back to the client Scheduler lets many jobs run in parallel 30
  • 31. Example: Standard Deviation 1 n Standard deviation n ( xi )2 1 Input: large text file Output:  standard  deviation  σ  of  word  lengths Example: σ({dogs, like, cats}) = 0 How many jobs ? 31
  • 32. Standard Deviation: Hint 2 2 2 1 n 1 n 1 n n ( xi ) 2 1 n xi 2 1 n xi 2 1 2 ( n n 1 1 xi ) n n 2 1 2 1 32
  • 33. Standard Deviation Mapper Map input is the set of words {w} in the partition - Key = null Value = w Map computes - Number of words in the partition - Total  length  of  the  words  ∑length(w) - The  sum  of  lengths  squared  ∑length(w)2 Map output - <“count”,  #words> - <“length”,  #totalLength> - <“squared”,  #sumLengthSquared> Map (null, w) Emit(“count”,  1)   Emit(“length”,  length(w)) Emit(“squared”,  length(w)2) 33
  • 34. Standard Deviation Reducer Reduce input - {<key, {value}>}, where - key  =  “count”,  “length”,  “squared” - value is an integer Reduce(key,  {n1,  n2,  …}) nRes =  n1  +  n2  +  … Emit(key, nRes) Reduce computes - Total number of words: N  =  sum  of  all  “count”  values - Total length of words: L  =  sum  of  all  “length”  values - Sum of length squares: S  =  sum  of  all  “squared”  values Reduce Output - <“count”,  N> - <“length”,  L> - <“squared”,  S> The result - µ=L/N - Analyze () read(“part-r-00000”) print(“mean = ”  +  L/N) print(“  = ”  +   sqrt(S/N – L*L / N*N)) σ  =  sqrt(S / N - µ2) 34
  • 35. Combiner, Partitioner Combiners perform local aggregation before the shuffle & sort phase - Optimization to reduce data transfers during shuffle - In Mean example reduces transfer of many keys to only two Partitioners assign intermediate (map) key-value pairs to reducers - Responsible for dividing up the intermediate key space - Not used with single Reducer Input Map Input Data Map Combiner Shuffle Partitioner & sort Reduce Output Input Data Reduce 35
  • 36. Distributed Sorting Sort a dataset, which cannot be entirely stored on one node. Input: - Set of files. 100 byte records. - The first 10 bytes of each record is the key and the rest is the value. Output: - Ordered list of files: f1,  …  fN - Each file fi is sorted, and - If i < j then for any keys k Є fi and r Є fj (k  ≤  r) - Concatenation of files in the given order must form a completely sorted record set 36
  • 37. Naïve MapReduce Sorting If the output could be stored on one node The input to any Reducer is always sorted by key - Shuffle sorts Map outputs One identity Mapper and one identity Reducer would do the trick - Identity: <k,v>  →  <k,v> Input Map Shuffle Reduce Output Input Data Input Data dogs cats cats Map dogs like Reduce like dogs cats like 37
  • 38. Sorting with Multiple Maps Multiple identity Mappers and one identity Reducer – same result - Does not work for multiple Reducers Input Input Data dogs Map Shuffle Reduce Output Output Data Map cats like Map Reduce dogs like cats Map 38
  • 39. Sorting: Generalization Define a hash function, such that - h:  {k}  →  [1,N] - Preserves  the  order:  k  ≤  s    →    h(k)  ≤  h(s) - h(k) is a fixed size prefix of string k (2 first bytes) Identity Mapper With a specialized Partitioner - Compute hash of the key h(k) and assigns <k,v> to reducer Rh(k) Identity Reducer - Number of reducers is N: R1,  …,  RN - Inputs for Ri are all pairs that have key h(k) = i - Ri is an identity reducer, which writes output to HDFS file fi - Hash function choice guarantees that keys from fi are less than keys from fj if i < j The algorithm was implemented to win Gray’s Terasort Benchmark in 2008 39
  • 41. Single NameNode of HDFS Why High Availability is Important? Scheduled downtime dominates Unscheduled - OS maintenance - Configuration changes Reasons for Unscheduled Downtime - 60 incidents in 500 days on 30,000 nodes - 24 Full GC – the majority - System bugs / Bad application / Insufficient resources - “Data  Availability  and  Durability  with  HDFS” Lack of Availability due to Performance Problems - A handful of nodes can saturate NameNode 41
  • 42. Hadoop-2 Active-Standby Architecture Provides failover to a Standby when Active Node fails Single Active NameNode shares journal with StandbyNode via shared storage: NFS, QJM 42
  • 43. WANdisco Active-Active Architecture Fully replicated NameNodes available for reads and writes Multiple equal-role NameNodes share namespace state via Coordination Engine Proposal, Agreements Coordinated updates 43
  • 44. WANdisco: Scaling Across Data Centers Continuous availability, and Disaster Recovery over a WAN Wide Area Network replication Metadata – online Data – offline 44
  • 45. What is Apache HBase A distributed key-value store for real-time access to semi-structured data Table: big, sparse, loosely structured Collection of rows, sorted by row keys - Rows can have arbitrary number of columns HBase Master - Table is split Horizontally into Regions NameNode - Dynamic Table partitioning - JobTracker Region Servers serve regions to applications Columns grouped into Column families - Vertical partition of tables TaskTracker TaskTracker TaskTracker RegionServer RegionServer RegionServer DataNode DataNode DataNode Distributed Cache: - Regions are  loaded  in  nodes’  RAM - Real-time access to data 45
  • 46. HBase Challenge Failure of a region requires failover - Regions reassigned to other Region Servers - Clients failover and reconnect to new servers Regions in high demand - Many client connections to one server introduce bottleneck Good idea to replicate popular regions on multiple Region Servers - Open Problem: consistent updates Solution: Coordinated updates 46
  • 47. Giraffa File System A distributed highly scalable file system using HDFS and HBase Challenge: RAM - namespace size limitation Giraffa is a distributed, highly available file system Utilizes features of HDFS and HBase New open source project in experimental stage 47
  • 48. Giraffa Requirements Availability – the primary goal - Load balancing of metadata traffic Same data streaming speed to / from DataNodes - Continuous Availability: No SPOF Cluster operability, management - Cost of running larger clusters same as a smaller one More files & more data HDFS Federated HDFS Giraffa Space 25 PB 120 PB 1 EB = 1000 PB Files + blocks 200 million 1 billion 100 billion Concurrent Clients 40,000 100,000 1 million 48
  • 49. Giraffa Architecture Namespace Service HBase 1. 1 NamespaceAgent App Giraffa client gets files and blocks from HBase 2. Namespace Table path, attrs, block[], DN[][] Block Manager handles block operations 3. Stream data to or from DataNodes Block Management Processor 2 Block Management Layer BM BM BM DN DN DN DN DN DN DN DN DN 3 49
  • 50. Thank you Contact: Samantha Leggat | t: 925.396.1194 | WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583