Weitere ähnliche Inhalte
Ähnlich wie Hadoop yarn 基本架构和发展趋势 (20)
Hadoop yarn 基本架构和发展趋势
- 10. hadoop123.com
Hadoop YARN容错
ResourceManager
存在单点故障;
正在基于ZooKeeper实现HA。
NodeManager
失败后,RM将失败任务告诉对应的AM;
AM决定如何处理失败的任务。
ApplicationMaster
失败后,由RM负责重启;
AM需处理内部任务的容错问题;
RMAppMaster会保存已经运行完成的Task,重启后无需重新运
行。
dongxicheng.org
- 11. hadoop123.com
Hadoop YARN调度框架
双层调度框架
RM将资源分配给AM
AM将资源进一步分配给各个Task
基于资源预留的调度策略
资源不够时,会为Task预留,直到资源充足
与“all or nothing”策略不同(Apache Mesos)
Job1
(MR Appcation Master)
containers
to launch
(tasks)
free
containers
JobTracker
(job1,job2,job3)
free
containers
resources
tasks
Job2
(MR Appcation Master)
YARN Cluster
YARN Cluster
(resource pool)
(resource pool)
containers
to launch
(tasks)
TaskTracker
TaskTracker
TaskTracker
TaskTracker
MRv1
containers
to launch
free
Containers
(tasks)
Job3
(MR Appcation Master)
dongxicheng.org
MRv2
- 12. hadoop123.com
Hadoop YARN资源调度器
多类型资源调度
采用DRF算法(论文:“Dominant Resource Fairness: Fair
Allocation of Multiple Resource Types”)
目前支持CPU和内存两种资源
提供多种资源调度器
FIFO
Fair Scheduler
Capacity Scheduler
多租户资源调度器
支持资源按比例分配
支持层级队列划分方式
支持资源抢占
dongxicheng.org
- 22. hadoop123.com
DAG计算框架Tez
多个作业之间存在数据依赖关系,并形成一个依赖关系有
向图( Directed Acyclic Graph ),该图的计算称为
“DAG计算”
Apache Tez:基于YARN的DAG计算框架
运行在YARN之上,充分利用YARN的资源管理和容错等功能;
提供了丰富的数据流(dataflow)API;
扩展性良好的“Input-Processor-Output”运行时模型;
动态生成物理数据流关系。
Phase 2
Map
Phase 4
Reduce
Phase 1
Phase 5
Map
Reduce
Reduce
Phase 3
dongxicheng.org
- 28. hadoop123.com
Worker
Executor
Blot-A Tasks, topology1
Executor
Spout Tasks, topology 2
Executor
Blot-1 Tasks, topology 2
Spout Tasks, topology 1
Executor
Blot-B Tasks, topology1
Executor
Blot-C Tasks, topology1
Executor
Blot-B Tasks, topology1
Worker
Executor
Executor
Blot-2 Tasks, topology 2
Executor
Blot-1 Tasks, topology 2
Worker
Worker
Spout Tasks, topology 1
Worker
Supervisor
Supervisor
Zookeeper
Supervisor
Nimbus
Executor
Worker
流式计算框架Storm
Executor
Blot-2 Tasks, topology 2
Executor
Blot-2 Tasks, topology 2
dongxicheng.org
- 35. hadoop123.com
其他Framework On YARN
Hoya:HBase on YARN ;
https://github.com/hortonworks/hoya/
LLAMA:Impala On YARN
http://cloudera.github.io/llama/
Kafka On YARN
https://github.com/kkasravi/kafka-yarn
dongxicheng.org
- 37. hadoop123.com
资源管理系统带来的好处
提高集群资源利用率
服务自动化部署
bin/hadoop install hbase -version 0.95.0 -slaves 10
InstallerApp
Master
zookeeper
YARN(资源管理系统)
zookeeper
zookeeper
HDFS2(分布式存储系统)
software
codebase
HBase
|__0.92.0
|__0.94.0
|__0.95.0
……
Storm
|__0.7.0
|__0.8.0
MYSQL
|__5.5.25
|__5.6.12
……
dongxicheng.org