2013 11-19-hoya-status

Hoya: HBase on YARN
Steve Loughran & Devaraj Das
{stevel, ddas} at hortonworks.com
@steveloughran, @ddraj
November 2013

© Hortonworks Inc. 2013

Hadoop as Next-Gen Platform

Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce
(data processing)

MapReduce

Others
(data processing)

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, reliable storage)


Page 2

YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applications Run Natively IN Hadoop
BATCH
INTERACTIVE IN-MEMORY STREAMING
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)

Samza

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

© Hortonworks Inc.

Page 3

BATCH
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)
HBase


And HBase?
BATCH
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)


HBase

Hoya: On-demand HBase clusters
1. Small HBase cluster in large YARN cluster
2. Dynamic HBase clusters

3. Self-healing HBase Cluster
4. Elastic HBase clusters
5. Transient/intermittent clusters for workflows
6. Custom versions & configurations
7. More efficient utilization/sharing of cluster

© Hortonworks Inc.

Page 6

Goal: No code changes in HBase
• Today : none

But we'd like
• ZK reporting of web UI ports
• A way to get from failed RS to YARN container
(configurable ID is enough)

© Hortonworks Inc.

Page 7

Hoya – the tool
• Hoya (Hbase On YArn)
– Java tool
– Completely CLI driven

• Input: cluster description as JSON
– Specification of cluster: node options, ZK params
– Configuration generated
– Entire state persisted

• Actions: create, freeze/thaw, flex, exists <cluster>
• Can change cluster state later
– Add/remove nodes, started / stopped states

© Hortonworks Inc.

YARN manages the cluster
•
•
•
•
•
•

Servers run YARN Node Managers
NM's heartbeat to Resource Manager
RM schedules work over cluster
YARN Node Manager
RM allocates containers to apps
NMs start containers
NMs report container health

YARN Resource Manager

HDFS

HDFS

YARN Node Manager

YARN Node Manager

HDFS

HDFS


Page 9

Hoya Client creates App Master
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HDFS

YARN Node Manager

YARN Node Manager

HDFS

HDFS


Page 10

AM deploys HBase with YARN
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HDFS

YARN Node Manager

YARN Node Manager
HBase Region Server

HBase Region Server
HDFS


HDFS

Page 11

HBase & clients bind via Zookeeper
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HBase Client

YARN Node Manager

HDFS

YARN Node Manager
HBase Region Server

HBase Region Server
HDFS


HDFS

Page 12

YARN notifies AM of failures
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HDFS

YARN Node Manager

YARN Node Manager

HBase Region Server

HBase Region Server

HBase Region Server
HDFS


HDFS

Page 13

HOYA - cool bits
• Cluster specification stored as JSON in HDFS
• Conf dir cached, dynamically patched before pushing
up as local resources for master & region servers
• HBase .tar file stored in HDFS -clusters can use the
same/different HBase versions
• Handling of cluster flexing is the same code as
unplanned container loss.
• No Hoya code on region servers

© Hortonworks Inc.

Page 14

HOYA - AM RPC API
//change cluster role counts
flexCluster(ClusterSpec)
//get current cluster state
getJSONClusterStatus() : ClusterSpec

listNodeUUIDsByRole(role): UUID[]
getNode(UUID): RoleInfo
getClusterNodes(UUID[]) RoleInfo[]
stopCluster()
© Hortonworks Inc.

Page 15

Flexing/failure handling is same code
boolean flexCluster(ClusterDescription updated) {
providerService.validateClusterSpec(updated);
appState.updateClusterSpec(updated);
return reviewRequestAndReleaseNodes();
}
void onContainersCompleted(List<ContainerStatus> completed) {
for (ContainerStatus status : completed) {
appState.onCompletedNode(status);
}
reviewRequestAndReleaseNodes();
}

© Hortonworks Inc.

Page 16

Cluster Specification: persistent & wire
{
"version" : "1.0",
"name" : "TestLiveTwoNodeRegionService",
"type" : "hbase",
"options" : {
"zookeeper.path" : "/yarnapps_hoya_stevel_live2nodes",
"cluster.application.image.path" : "hdfs://bin/hbase-0.96.tar.gz",
"zookeeper.hosts" : "127.0.0.1"
},
"roles" : {
"worker" : {
"role.instances" : "2",
},
"hoya" : {
},
"master" : {
}
},
...
}


Role Specifications
"roles" : {
"worker" : {
"yarn.memory" : "256",
"jvm.heapsize" : "256",
"yarn.vcores" : "1",
"app.infoport" : "0"
"env.MALLOC_ARENA_MAX": "4"
},
"master" : {
"yarn.memory" : "128",
"jvm.heapsize" : "128",
"yarn.vcores" : "1",
"app.infoport" : "8585"
}
}


Current status
• HBase clusters on-demand
• Accumulo clusters (5+ roles, different “provider”)

• Cluster freeze, thaw, flex, destroy
• Location of role instances tracked & persisted
–for placement close to data after failure, thaw
• Secure cluster support

© Hortonworks Inc.

Ongoing
• Multiple roles: worker, master, monitor
--role worker --roleopts worker yarn.vcores 2

• Multiple Providers: HBase + others
– client side: preflight, configuration patching
– server side: starting roles, liveness

• Liveness probes: HTTP GET, RPC port, RPC op?
• What do we need in YARN for production?

© Hortonworks Inc.

Page 20

Ongoing
• Better failure handling, blacklisting
• Liveness probes: HTTP GET, RPC port, RPC op?

• Testing: functional, scale & load
• What do we need in Hoya for production?
• What do we need in YARN for production?

© Hortonworks Inc.

Page 21

Requirements of an App: MUST
• Install from tarball; run as normal user
• Deploy/start without human intervention
• Pre-configurable, static instance config data
• Support dynamic discovery/binding of peers
• Co-existence with other app instance in cluster/nodes
• Handle co-located role instances
• Persist data to HDFS
• Support 'kill' as a shutdown option
• Handle failed role instances
• Support role instances moving after failure

© Hortonworks Inc.

Page 22

Requirements of an App: SHOULD
• Be configurable by Hadoop XML files
• Publish dynamically assigned web UI & RPC ports
• Support cluster flexing up/down
• Support API to determine role instance status
• Make it possible to determine role instance ID from
app
• Support simple remote liveness probes

© Hortonworks Inc.

Page 23

YARN-896: long-lived services
1. Container reconnect on AM restart
2. YARN Token renewal on long-lived apps

3. Containers: signalling, >1 process sequence
4. AM/RM managed gang scheduling
5. Anti-affinity hint in container requests
6. Service Registry - ZK?
7. Logging

© Hortonworks Inc.

Page 24

SLAs & co-existence with MapReduce
1. Make IO bandwidth/IOPs a resource used in
scheduling & limits

2. Need to monitor what's going on w.r.t IO & net load
from containers  apps  queues
3. Dynamic adaptation of cgroup HDD, Net, RAM limits
4. Could we throttle MR job File & HDFS IO
bandwidth?

© Hortonworks Inc.

Page 25

2013 11-19-hoya-status

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie 2013 11-19-hoya-status

Ähnlich wie 2013 11-19-hoya-status (20)

Mehr von Steve Loughran

Mehr von Steve Loughran (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

2013 11-19-hoya-status

Hinweis der Redaktion