A session focused on ramping you up on what Hadoop is, how its works and what it's capable of. We will also look at what Hadoop 2.x and YARN brings to the table and some future projects in the Hadoop space to keep an eye on.
6. MapReduce
v1
LimitaTons
Scalability
Maximum
cluster
size
is
4,000
nodes
and
maximum
concurrent
tasks
is
40,000
Availability
JobTracker
failure
kills
all
queued
and
running
jobs
Resources
ParVVoned
into
Map
and
Reduce
Hard
parTToning
of
Map
and
Reduce
slots
led
to
low
resource
uVlizaVon
No
Support
for
Alternate
Paradigms
/
Services
Only
MapReduce
batch
jobs,
nothing
else
7. HADOOP
1.0
Single
Use
System
Batch
Apps
Apache
Hadoop
1.0:
Single
Use
System
HDFS
(redundant,
reliable
storage)
MapReduce
(cluster
resource
management
and
data
processing)
Pig
Hive
9. YARN
Replaces
MapReduce
Yet
Another
Resource
NegoVator
YARN
YARN
will
be
the
de-‐facto
distributed
operaVng
system
for
Big
Data
10. Store
DATA
in
one
place
YARN:
Taking
Hadoop
Beyond
Batch
Interact
with
that
data
in
MULTIPLE
WAYS
with
Predictable
Performance
and
Quality
of
Service
ApplicaTons
Run
NaTvely
IN
Hadoop
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
ONLINE
(HBase)
STREAMING
(DataTorrent)
GRAPH
(Giraph)
11. Running
all
on
the
same
Hadoop
cluster
to
give
applicaVons
access
to
all
the
same
source
data!
YARN:
ApplicaTons
MapReduce
v2
Stream
Processing
Master-‐Worker
Online
In-‐Memory
Apache
Storm
12. 2010
2011
2012
2013
2014
Today
YARN:
Moving
Quickly
Conceived
at
Yahoo!
Alpha
Releases
–
2.0
Beta
Releases
–
2.1
GA
Released
–
2.2
100,000+
nodes,
400,000+
jobs
daily
10
million+
hours
of
compute
daily
Version
2.3
Version
2.4
14. YARN:
How
It
Works
ResourceManager
NodeManager
ApplicaVonMaster
NodeManager
NodeManager
NodeManager
Scheduler
Container
Container
Container
Client
15. YARN:
What
Has
Changed?
YARN
MRv1
RM
ResourceManager
AM
ApplicaVonMaster
JT
JobTracker
Scheduler
Scheduler
NM
NodeManager
TT
TaskTracker
Container
Map
Reduce
ResourceManager
Scheduler
JobTracker
Scheduler
NodeManager
ApplicaVonMaster
TaskTracker
Map
Reduce
NodeManager
Container
Container
TaskTracker
Map
Reduce
16. ! Scale
! New
programming
models
and
services
! Improved
cluster
uVlizaVon
! Agility
! Backwards
compaVble
with
MapReduce
v1
! Mixed
workloads
on
the
same
source
of
data
6
Benefits
of
YARN
18. Speed
Deliver
interacTve
query
performance.
SQL
on
Hadoop
SQL
Support
array
of
SQL
semanTcs
for
analyTc
applicaTons
running
against
Hadoop.
Scale
SQL
interface
to
Hadoop
designed
for
queries
that
scale
from
Terabytes
to
Petabytes
19. Hive
on
Apache
Tez
Hortonworks
Next
Gen
SQL
on
Hadoop
Hive
on
Apache
Spark
Cloudera
Cloudera
Impala
Cloudera
Apache
Drill
MapR
20. Dynamic
Scaling
On-‐demand
cluster
size.
Increase
and
decrease
the
size
with
load.
HOYA:
HBase
(NoSQL)
on
YARN
Easier
Deployment
APIs
to
create,
start,
stop
and
delete
HBase
clusters.
Availability
Recover
from
Region
Server
loss
with
a
new
container.
21. Machine
Learning
Framework
well
suited
for
building
machine
learning
jobs.
Microsog
REEF
Scalable
/
Fault
Tolerant
Makes
it
easy
to
implement
scalable,
fault-‐
tolerant
runTme
environments
for
a
range
of
computaTonal
models.
Maintain
State
Users
can
build
jobs
that
uTlize
data
from
where
it’s
needed
and
also
maintain
state
ager
jobs
are
done.
Retainable
Evaluator
ExecuTon
Framework