Virtual Machines are a mainstay in the enterprise. Apache Hadoop is normally run on bare machines. This talk walks through the convergence and the use of virtual machines for running ApacheHadoop. We describe the results from various tests and benchmarks which show that the overhead of using VMs is small. This is a small price to pay for the advantages offered by virtualization. The second half of talk compares multi-tenancy with VMs versus multi-tenancy of with Hadoop`s Capacity scheduler. We follow on with a comparison of resource management in V-Sphere and the finer grained resource management and scheduling in NextGen MapReduce. NextGen MapReduce supports a general notion of a container (such as a process, jvm, virtual machine etc) in which tasks are run;. We compare the role of such first class VM support in Hadoop.
3. Say
What?
• VMs
will
just
add
overhead,
due
to
I/O
virt
• VMs
run
on
SAN,
we’re
all
about
local
disks
• Hadoop
does
it’s
own
cluster
management
• It’ll
do
resource
management
in
2.0
• And
even
HA
is
coming
to
Hadoop
• And…
what
is
the
point,
anyway?
4. But
you’ve
been
asking…
• Can
I
virtualize
my
Hadoop,
so
that
I
can
make
it
easier,
quicker
to
get
a
cluster
up
and
running
• Is
it
possible
to
run
Hadoop
on
those
spare
machine
cycles
I
have
on
hundreds/thousands
of
nodes?
• Can
I
make
my
system
more
available
by
using
some
of
the
standard
HA
features?
5. And
the
savvy
are
asking…
• Can
I
avoid
having
to
install
special
hardware
for
the
master
services,
like
name-‐node,
job-‐
tracker?
• Can
I
dynamically
change
the
size
of
the
cluster
to
use
more
resources?
• Can
I
use
VM
isolaSon
to
increase
security
or
guard
against
resource-‐intensive
neighbors?
• Is
it
feasible
to
provision
virtual-‐clusters,
giving
out
one
each
to
a
business
unit?
6. VirtualizaSon,
in
VMware’s
vSphere
File
TCP/IP
System
Monitor
Emulates
Physical
Guest
Guest
Devices:
CPU,
Memory,
I/O
Monitor
Monitor
CPU
is
controlled
by
scheduler
Virtual
NIC
Virtual
SCSI
and
virtualized
by
monitor
Memory
VMkernel
Scheduler Manager
Virtual
Switch
File
System
Memory
is
allocated
by
the
VMkernel
and
virtualized
by
NIC
Drivers
I/O
Drivers
the
monitor
Network
and
I/O
devices
are
Physical
emulated
and
proxied
though
Hardware
naSve
device
drivers
7. Ok,
so
first
what
about
the
concerns?
• Use
your
SAN?
…
if
you
want
to.
SAN
Storage
NAS
Filers
Local
Storage
$2
-‐
$10/Gigabyte
$1
-‐
$5/Gigabyte
$0.05/Gigabyte
$1M
gets:
$1M
gets:
$1M
gets:
0.5Petabytes
1
Petabyte
20
Petabytes
1,000,000
IOPS
400,000
IOPS
10,000,000
IOPS
1Gbyte/sec
2Gbyte/sec
800
Gbytes/sec
8. Hadoop
Using
Local
Disks
Task
Tracker
Datanode
Other
Hadoop
Workload
Virtual
Machine
Ext4
Ext4
Ext4
Virtualiza?on
Host
OS
Image
-‐
VMDK
VMDK
VMDK
VMDK
Shared
Storage
9. Hadoop
Perf
in
a
VM
(RaSo
is
elapsed
Sme
to
physical,
Lower
Is
Becer)
1.2
1
Ra?o
to
Na?ve
0.8
0.6
0.4
1
VM
2
VMs
0.2
0
10. EvoluSon
of
Hadoop
on
VMs
VM
VM
VM
VM
Current
Hadoop:
Compute
T1
T2
Combined
VM
VM
Storage/ Storage
Storage
Compute
Hadoop
in
VM
Separate
Storage
Separate
Compute
Clusters
-‐ VM
lifecycle
-‐ Separate
compute
-‐ Separate
virtual
clusters
determined
from
data
per
tenant
by
Datanode
-‐ ElasSc
compute
-‐ Stronger
VM-‐grade
security
-‐ NOT
ElasSc
-‐ Enable
shared
and
resource
isolaSon
-‐ Limited
to
Hadoop
workloads
-‐ Enable
deployment
of
MulS-‐Tenancy
-‐ Raise
uSlizaSon
mulSple
Hadoop
runSme
versions
11. 1.
Hadoop
Task
Tracker
and
Data
Node
in
a
VM
Add/Remove
Slot
Slots?
Slot
Other
Virtual
Task
Tracker
Hadoop
Workload
Node
Datanode
Grow/Shrink
by
tens
of
GB?
Virtualiza?on
Host
VMDK
Grow/Shrink
of
a
VM
is
one
approach
13. But
State
makes
it
hard
to
power-‐off
a
node
Slot
Slot
Other
Virtual
Task
Tracker
Hadoop
Workload
Node
Datanode
Virtualiza?on
Host
VMDK
Powering
off
the
Hadoop
VM
would
loose
the
Datanode
14. Adding
a
node
needs
data…
Slot
Slot
Slot
Slot
Other
Virtual
Task
Tracker
Virtual
Task
Tracker
Hadoop
Hadoop
Workload
Node
Node
Datanode
Datanode
Virtualiza?on
Host
VMDK
VMDK
Adding
a
node
would
require
TBs
of
data
replica?on
19. Demo:
Shrink/Expand
Cluster
Setup
1
Datanodes,
2
Nodemanagers
and
2
web
servers
on
each
physical
host
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
Datanode
Datanode
Datanode
Datanode
20. Demo:
Shrink/Expand
Cluster
When
web
load
is
high
in
daySme,
we
can
suspend
some
Nodemanagers
and
power
on
more
Web
servers.
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
Web
Server
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
Datanode
Datanode
Datanode
Datanode
24. Expand
Hadoop
Ecosystem
• Hortonworks
goal
– Expand
Hadoop
ecosystem
– Provide
first
class
support
of
various
plajorms
• Hadoop
should
run
well
on
VMs
• VMs
offer
several
advantages
as
presented
earlier
• Take
advantage
of
vSphere
for
HA
Page
24
25. VMware-‐Hortonworks
Joint
Engineering
• First
class
support
for
VMs
– Topology
plugins
(Hadoop-‐8468)
• 2
VMs
can
be
on
same
host
– Pick
closer
data
– Schedule
tasks
closer
– Don’t
put
two
replicas
on
same
host
– MR-‐tmp
on
HDFS
using
block
pools
• ElasSc
Compute-‐VMs
will
not
need
local
disk
– Fast
communicaSons
within
VMs
Page
25
26. Hadoop
Total
System
Availability
Architecture
Slave
Nodes
of
Hadoop
Cluster
job
job
job
job
job
Apps
Running
Outside
Failover
JT
into
Safemode
NN
JT
NN
N+K
Server
Server
Server
failover
HA
Cluster
for
Master
Daemons
26
30. NameNode
HA
–
Failover
Times
• NameNode
Failover
Smes
with
vSphere
and
LinuxHA
– Failure
detecSon
+
Failover
–
0.5
to
2
minutes
– OS
bootup
needed
for
vSphere
–
1
minute
– Namenode
Startup
(exit
safemode)
• Small/Medium
clusters
–
1
to
2
minutes
• Large
cluster
–
5
to
15
minutes
• NameNode
startup
Sme
measurements
– 60
Nodes,
60K
files,
6
million
blocks,
300
TB
raw
storage
–
40
sec
– 180
Nodes,
200K
files,
18
million
blocks,
900TB
raw
storage
–
120
sec
Cold
Failover
is
good
enough
for
small/medium
clusters
Failure
Detec:on
and
Automa:c
Failover
Dominates
30
31. Summary
• Advantages
of
Hadoop
on
VMs
– Cluster
Management
– Cluster
consolidaSon
– Greater
ElasScity
in
mixed
environment
– Alternate
mulS-‐tenancy
to
capacity
scheduler’s
offerings
• HA
for
Hadoop
Master
Daemons
– vSphere
based
HA
for
NN,
JT,
…
in
Hadoop
1
– Total
System
Availability
Architecture
Page
31
34. Hadoop
ConfiguraSon
DistribuSon
– Based
on
Apache
open-‐source
0.20.2
Parameters
– dfs.datanode.max.xcievers=4096
– dfs.replicaSon=2
– dfs.block.size=134217728
– io.file.buffer.size=131072
– mapred.child.java.opts=”-‐Xmx2048m
-‐Xmn512m”
(naSve)
– mapred.child.java.opts=”-‐Xmx1900m
-‐Xmn512m”
(virtual)
• Network
topology
– Hadoop
uses
info
for
reliability
and
performance
– MulSple
VMs/host:
Each
host
is
a
“rack”
35. What
about
Performance?
Mellanox10
GbE
switch
AMAX
ClusterMax
2X
X5650,
96
GB
12X
SATA
500
GB
Mellanox
10
GbE
adapter
36. Tying
it
together:
ElasSc
Hadoop
Coke
Pepsi
Hadoop
Hadoop
Hadoop
Hadoop
Queue
Virtual
Virtual
Virtual
Virtual
RunSme
Layer
Data
Layer
Data
Data
Data
Container
Container
Container
Distributed
File
System
(HDFS,
KFS,
MAPR,
Isilon,…)
Host
Host
Host
Host
Host
Host
37. Resource
Shiwing
using
VirtualizaSon
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Virtualiza?on
PlaQorm
Host
Host
Host
HDFS
HDFS
HDFS
While
exisSng
apps
run
during
the
day
to
support
business
operaSons,
Hadoop
batch
jobs
kicks
off
at
night
to
conduct
deep
analysis
of
data.
38. The
cluster
is
the
machine
HP
vCenter
HP
1 2 ProLiant 1 2 ProLiant
OVER DL380G6 OVER DL380G6
1 2 TEMP 1 5 1 2 TEMP 1 5
POWER POWER POWER POWER
SUPPLY SUPPLY INTER PL A Y ER SUPPLY SUPPLY INTER PL A Y ER
LOCK LOCK
POWER CAP POWER CAP
DIMMS DIMMS
1A 3G 5E 7C 9i 9i 7C 5E 3G 1A 1A 3G 5E 7C 9i 9i 7C 5E 3G 1A
2 6 2 6
2D 4B 6H 8F 8F 6H 4B 2D 2D 4B 6H 8F 8F 6H 4B 2D
ONLINE ONLINE
1 SPARE 2 1 SPARE 2
PROC PROC PROC PROC
MIRROR MIRROR
FANS FANS
3 7 3 7
1 2 3 4 5 6 1 2 3 4 5 6
4 8 4 8
Imbalanced
Balanced
Cluster
Cluster
POWER
SUPPLY
1
POWER CAP
POWER
SUPPLY
1
2
2
OVER
TEMP
INTER
LOCK
1 5
PL A Y ER
HP
ProLiant
DL380G6
Heavy
Load
POWER
SUPPLY
POWER CAP
1
1A 3G 5E 7C 9i
POWER
SUPPLY
1
2
2
OVER
TEMP
INTER
LOCK
DIMMS
9i 7C 5E 3G 1A
1 5
PL A Y ER
HP
ProLiant
DL380G6
DIMMS
1A 3G 5E 7C 9i 9i 7C 5E 3G 1A 2 6
2 6 2D 4B 6H 8F 8F 6H 4B 2D
ONLINE
1 SPARE 2
2D 4B 6H 8F 8F 6H 4B 2D
ONLINE PROC PROC
1 2 MIRROR
SPARE FANS
PROC PROC
3 7
MIRROR 1 2 3 4 5 6
FANS
3 7
1 2 3 4 5 6
4 8
4 8
Lighter
Load
39. SAN,
NAS
or
Local
Storage?
• Shared
Storage:
SAN
or
NAS
• Hybrid
Storage
– Easy
to
provision
– SAN/NAS
for
boot
images,
– Automated
cluster
VMs,
other
workloads
rebalancing
– Local
disk
for
Hadoop
&
HDFS
– Scalable
Bandwidth,
Lower
Cost/GB
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Other
VM
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Host
Host
Host
Host
Host
Host