SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
YARN
Apache Hadoop Next Generation
Compute Platform
Bikas Saha
@bikassaha

© Hortonworks Inc. 2013

Page 1
Apache Hadoop & YARN
• Apache Hadoop
– De facto Big Data open source platform
– Running for about 5 years in production at hundreds of companies
like Yahoo, Ebay and Facebook

• Hadoop 2
– Significant improvements in HDFS distributed storage layer. High
Availability, NFS, Snapshots
– YARN – next generation compute framework for Hadoop designed
from the ground up based on experience gained from Hadoop 1
– YARN running in production at Yahoo for about a year
– YARN awarded Best Paper at SOCC 2013

© Hortonworks Inc. 2013 - Confidential

Page 2
1st Generation Hadoop: Batch Focus
HADOOP 1.0
Built for Web-Scale Batch Apps

Single App

Single App

INTERACTIVE

ONLINE

Single App

Single App

Single App

BATCH

BATCH

BATCH

HDFS

HDFS

All other usage patterns
MUST leverage same
infrastructure

HDFS

© Hortonworks Inc. 2013 - Confidential

Forces Creation of Silos to
Manage Mixed Workloads

Page 3
Hadoop 1 Architecture
JobTracker
Manage Cluster Resources & Job Scheduling

TaskTracker
Per-node agent

Manage Tasks

© Hortonworks Inc. 2013 - Confidential

Page 4
Hadoop 1 Limitations
Lacks Support for Alternate Paradigms and Services
Force everything needs to look like Map Reduce
Iterative applications in MapReduce are 10x slower

Scalability
Max Cluster size ~5,000 nodes
Max concurrent tasks ~40,000

Availability
Failure Kills Queued & Running Jobs

Hard partition of resources into map and reduce slots
Non-optimal Resource Utilization
© Hortonworks Inc. 2013 - Confidential

Page 5
Our Vision: Hadoop as Next-Gen Platform

Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce

Others

(data processing)

MapReduce

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, highly-available & reliable storage)

© Hortonworks Inc. 2013 - Confidential

Page 6
Hadoop 2 - YARN Architecture
ResourceManager (RM)
Central agent - Manages and allocates

cluster resources

Node
Manager

NodeManager (NM)
Per-Node agent - Manages and

App Mstr

enforces node resource allocations

ApplicationMaster (AM)
Per-Application –

Resource
Manager

Node
Manager

Client
Container

Manages application

lifecycle and task
scheduling

MapReduce Status
Job Submission

Node
Manager

Node Status
Resource Request

© Hortonworks Inc. 2013 - Confidential

Page 7
YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applications Run Natively in Hadoop
BATCH
INTERACTIVE
(MapReduce)
(Tez)

ONLINE
(HBase)

STREAMING
(Storm, S4,…)

GRAPH
(Giraph)

IN-MEMORY
(Spark)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

© Hortonworks Inc. 2013 - Confidential

Page 8
5 Key Benefits of YARN
1.

New Applications & Services

2.

Improved cluster utilization

3.

Scale

4.

Experimental Agility

5.

Shared Services

© Hortonworks Inc. 2013 - Confidential

Page 9
Key Improvements in YARN
Framework supporting multiple applications
– Separate generic resource brokering from application logic
– Define protocols/libraries and provide a framework for custom
application development
– Share same Hadoop Cluster across applications

Cluster Utilization
– Generic resource container model replaces fixed Map/Reduce
slots. Container allocations based on locality, memory (CPU
coming soon)
– Sharing cluster among multiple application

© Hortonworks Inc. 2013 - Confidential

Page 10
Key Improvements in YARN
Scalability
– Removed complex app logic from RM, scale further
– State machine, message passing based loosely coupled design
– Compact scheduling protocol

Application Agility and Innovation
– Use Protocol Buffers for RPC gives wire compatibility
– Map Reduce becomes an application in user space unlocking
safe innovation
– Multiple versions of an app can co-exist leading to
experimentation
– Easier upgrade of framework and application

© Hortonworks Inc. 2013 - Confidential

Page 11
Key Improvements in YARN
Shared Services
– Common services needed to build distributed application are
included in a pluggable framework
– Distributed file sharing service
– Remote data read service
– Log Aggregation Service

© Hortonworks Inc. 2013 - Confidential

Page 12
YARN: Efficiency with Shared Services

Yahoo! leverages YARN
40,000+ nodes running YARN across over 365PB of data

~400,000 jobs per day for about 10 million hours of compute
time
Estimated a 60% – 150% improvement on node usage per

day using YARN
Eliminated Colo (~10K nodes) due to increased utilization
For more details check out the YARN SOCC 2013 paper
© Hortonworks Inc. 2013 - Confidential

Page 13
YARN as Cluster Operating System
ResourceManager

Scheduler

NodeManager

NodeManager

NodeManager

NodeManager

map 1.1
nimbus0

vertex1.1.1

vertex1.2.2

NodeManager

NodeManager

NodeManager

NodeManager

map1.2
Batch

Interactive SQL

vertex1.1.2

nimbus2

NodeManager

NodeManager

NodeManager

NodeManager

nimbus1
Real-Time

vertex1.2.1

reduce1.1

© Hortonworks Inc. 2013 - Confidential

Page 14
Multi-Tenancy is Built-in
• Queues
• Economics as queue-capacity
– Hierarchical Queues

• SLAs

ResourceManager

– Cooperative Preemption

Scheduler

• Resource Isolation
– Linux: cgroups
– Roadmap: Virtualization (Xen, KVM)

• Administration
– Queue ACLs
– Run-time re-configuration for queues
Default Capacity Scheduler supports
all features
© Hortonworks Inc. 2013 - Confidential

Hierarchical
Queues

root

Mrkting
20%

Dev
20%

Adhoc
10%

Prod
80%

DW
70%

Dev Reserved Prod
10%
20%
70%

P0
70%

P1
30%

Capacity Scheduler
Page 15
YARN Eco-system
Applications Powered by YARN
Apache Giraph – Graph Processing
Apache Hama - BSP
Apache Hadoop MapReduce – Batch
Apache Tez – Batch/Interactive
Apache S4 – Stream Processing
Apache Samza – Stream Processing
Apache Storm – Stream Processing
Apache Spark – Iterative applications
Elastic Search – Scalable Search
Cloudera Llama – Impala on YARN
DataTorrent – Data Analysis
HOYA – HBase on YARN

© Hortonworks Inc. 2013 - Confidential

There's an app for that...
YARN App Marketplace!

Frameworks Powered By YARN
Apache Twill
REEF by Microsoft
Spring support for Hadoop 2

Page 16
YARN Application Lifecycle
Application Client
Protocol

Application Client

YarnClient
App
Specific API

Resource
Manager
NodeManager
Application Master
Protocol

App
Container

Application Master

AMRMClient

Container
Management
Protocol

NMClient

© Hortonworks Inc. 2013 - Confidential

Page 17
BYOA – Bring Your Own App
Application Client Protocol: Client to RM interaction
– Library: YarnClient
– Application Lifecycle control
– Access Cluster Information

Application Master Protocol: AM – RM interaction
– Library: AMRMClient / AMRMClientAsync
– Resource negotiation
– Heartbeat to the RM

Container Management Protocol: AM to NM interaction
– Library: NMClient/NMClientAsync
– Launching allocated containers
– Stop Running containers

Use external frameworks like Twill/REEF/Spring
© Hortonworks Inc. 2013 - Confidential

Page 18
YARN Future Work
• ResourceManager High Availability
– Automatic failover
– Work preserving failover

• Scheduler Enhancements
– SLA Driven Scheduling, Low latency allocations
– Multiple resource types – disk/network/GPUs/affinity

• Rolling upgrades
• Generic History Service
• Long running services
– Better support to running services like HBase
– Service Discovery

• More utilities/libraries for Application Developers
– Failover/Checkpointing

© Hortonworks Inc. 2013 - Confidential

Page 19
Key Take-Aways
• YARN is a platform to build/run Multiple Distributed Applications
in Hadoop
• YARN is completely Backwards Compatible for existing
MapReduce apps
• YARN enables Fine Grained Resource Management via Generic
Resource Containers.
• YARN has built-in support for multi-tenancy to share cluster
resources and increase cost efficiency
• YARN provides a cluster operating system like abstraction for a
modern data architecture

© Hortonworks Inc. 2013 - Confidential

Page 20
Apache YARN
The Data Operating System for Hadoop 2.0
Flexible

Efficient

Shared

Enables other purpose-built data
processing models beyond
MapReduce (batch), such as
interactive and streaming

Increase processing IN Hadoop
on the same hardware while
providing predictable
performance & quality of service

Provides a stable, reliable,
secure foundation and
shared operational services
across multiple workloads

Data Processing Engines Run Natively IN Hadoop
BATCH
MapReduce

INTERACTIVE
Tez

ONLINE
HBase

STREAMING
Storm, S4, …

GRAPH
Giraph

MICROSOFT
REEF

SAS
LASR, HPA

OTHERS

YARN: Cluster Resource Management
HDFS2: Redundant, Reliable Storage

© Hortonworks Inc. 2013 - Confidential

Page 21
Thank you!

http://hortonworks.com/products/hortonworks-sandbox/

Download Sandbox: Experience Apache Hadoop
Both 2.0 and 1.x Versions Available!
http://hortonworks.com/products/hortonworks-sandbox/

Questions?
© Hortonworks Inc. 2013 - Confidential

Page 22

Weitere ähnliche Inhalte

Was ist angesagt?

A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoophadooparchbook
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 

Was ist angesagt? (20)

Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 

Ähnlich wie Apache YARN: Next-Gen Compute Platform for Hadoop

Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensionshdhappy001
 

Ähnlich wie Apache YARN: Next-Gen Compute Platform for Hadoop (20)

Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions
 

Mehr von hdhappy001

詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践hdhappy001
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务hdhappy001
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐hdhappy001
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践hdhappy001
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践hdhappy001
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进hdhappy001
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架hdhappy001
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题hdhappy001
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎hdhappy001
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术hdhappy001
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探hdhappy001
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scalehdhappy001
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群hdhappy001
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术hdhappy001
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台hdhappy001
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略hdhappy001
 

Mehr von hdhappy001 (20)

詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Apache YARN: Next-Gen Compute Platform for Hadoop

  • 1. YARN Apache Hadoop Next Generation Compute Platform Bikas Saha @bikassaha © Hortonworks Inc. 2013 Page 1
  • 2. Apache Hadoop & YARN • Apache Hadoop – De facto Big Data open source platform – Running for about 5 years in production at hundreds of companies like Yahoo, Ebay and Facebook • Hadoop 2 – Significant improvements in HDFS distributed storage layer. High Availability, NFS, Snapshots – YARN – next generation compute framework for Hadoop designed from the ground up based on experience gained from Hadoop 1 – YARN running in production at Yahoo for about a year – YARN awarded Best Paper at SOCC 2013 © Hortonworks Inc. 2013 - Confidential Page 2
  • 3. 1st Generation Hadoop: Batch Focus HADOOP 1.0 Built for Web-Scale Batch Apps Single App Single App INTERACTIVE ONLINE Single App Single App Single App BATCH BATCH BATCH HDFS HDFS All other usage patterns MUST leverage same infrastructure HDFS © Hortonworks Inc. 2013 - Confidential Forces Creation of Silos to Manage Mixed Workloads Page 3
  • 4. Hadoop 1 Architecture JobTracker Manage Cluster Resources & Job Scheduling TaskTracker Per-node agent Manage Tasks © Hortonworks Inc. 2013 - Confidential Page 4
  • 5. Hadoop 1 Limitations Lacks Support for Alternate Paradigms and Services Force everything needs to look like Map Reduce Iterative applications in MapReduce are 10x slower Scalability Max Cluster size ~5,000 nodes Max concurrent tasks ~40,000 Availability Failure Kills Queued & Running Jobs Hard partition of resources into map and reduce slots Non-optimal Resource Utilization © Hortonworks Inc. 2013 - Confidential Page 5
  • 6. Our Vision: Hadoop as Next-Gen Platform Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce Others (data processing) MapReduce YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, highly-available & reliable storage) © Hortonworks Inc. 2013 - Confidential Page 6
  • 7. Hadoop 2 - YARN Architecture ResourceManager (RM) Central agent - Manages and allocates cluster resources Node Manager NodeManager (NM) Per-Node agent - Manages and App Mstr enforces node resource allocations ApplicationMaster (AM) Per-Application – Resource Manager Node Manager Client Container Manages application lifecycle and task scheduling MapReduce Status Job Submission Node Manager Node Status Resource Request © Hortonworks Inc. 2013 - Confidential Page 7
  • 8. YARN: Taking Hadoop Beyond Batch Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service Applications Run Natively in Hadoop BATCH INTERACTIVE (MapReduce) (Tez) ONLINE (HBase) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave…) YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) © Hortonworks Inc. 2013 - Confidential Page 8
  • 9. 5 Key Benefits of YARN 1. New Applications & Services 2. Improved cluster utilization 3. Scale 4. Experimental Agility 5. Shared Services © Hortonworks Inc. 2013 - Confidential Page 9
  • 10. Key Improvements in YARN Framework supporting multiple applications – Separate generic resource brokering from application logic – Define protocols/libraries and provide a framework for custom application development – Share same Hadoop Cluster across applications Cluster Utilization – Generic resource container model replaces fixed Map/Reduce slots. Container allocations based on locality, memory (CPU coming soon) – Sharing cluster among multiple application © Hortonworks Inc. 2013 - Confidential Page 10
  • 11. Key Improvements in YARN Scalability – Removed complex app logic from RM, scale further – State machine, message passing based loosely coupled design – Compact scheduling protocol Application Agility and Innovation – Use Protocol Buffers for RPC gives wire compatibility – Map Reduce becomes an application in user space unlocking safe innovation – Multiple versions of an app can co-exist leading to experimentation – Easier upgrade of framework and application © Hortonworks Inc. 2013 - Confidential Page 11
  • 12. Key Improvements in YARN Shared Services – Common services needed to build distributed application are included in a pluggable framework – Distributed file sharing service – Remote data read service – Log Aggregation Service © Hortonworks Inc. 2013 - Confidential Page 12
  • 13. YARN: Efficiency with Shared Services Yahoo! leverages YARN 40,000+ nodes running YARN across over 365PB of data ~400,000 jobs per day for about 10 million hours of compute time Estimated a 60% – 150% improvement on node usage per day using YARN Eliminated Colo (~10K nodes) due to increased utilization For more details check out the YARN SOCC 2013 paper © Hortonworks Inc. 2013 - Confidential Page 13
  • 14. YARN as Cluster Operating System ResourceManager Scheduler NodeManager NodeManager NodeManager NodeManager map 1.1 nimbus0 vertex1.1.1 vertex1.2.2 NodeManager NodeManager NodeManager NodeManager map1.2 Batch Interactive SQL vertex1.1.2 nimbus2 NodeManager NodeManager NodeManager NodeManager nimbus1 Real-Time vertex1.2.1 reduce1.1 © Hortonworks Inc. 2013 - Confidential Page 14
  • 15. Multi-Tenancy is Built-in • Queues • Economics as queue-capacity – Hierarchical Queues • SLAs ResourceManager – Cooperative Preemption Scheduler • Resource Isolation – Linux: cgroups – Roadmap: Virtualization (Xen, KVM) • Administration – Queue ACLs – Run-time re-configuration for queues Default Capacity Scheduler supports all features © Hortonworks Inc. 2013 - Confidential Hierarchical Queues root Mrkting 20% Dev 20% Adhoc 10% Prod 80% DW 70% Dev Reserved Prod 10% 20% 70% P0 70% P1 30% Capacity Scheduler Page 15
  • 16. YARN Eco-system Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama - BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications Elastic Search – Scalable Search Cloudera Llama – Impala on YARN DataTorrent – Data Analysis HOYA – HBase on YARN © Hortonworks Inc. 2013 - Confidential There's an app for that... YARN App Marketplace! Frameworks Powered By YARN Apache Twill REEF by Microsoft Spring support for Hadoop 2 Page 16
  • 17. YARN Application Lifecycle Application Client Protocol Application Client YarnClient App Specific API Resource Manager NodeManager Application Master Protocol App Container Application Master AMRMClient Container Management Protocol NMClient © Hortonworks Inc. 2013 - Confidential Page 17
  • 18. BYOA – Bring Your Own App Application Client Protocol: Client to RM interaction – Library: YarnClient – Application Lifecycle control – Access Cluster Information Application Master Protocol: AM – RM interaction – Library: AMRMClient / AMRMClientAsync – Resource negotiation – Heartbeat to the RM Container Management Protocol: AM to NM interaction – Library: NMClient/NMClientAsync – Launching allocated containers – Stop Running containers Use external frameworks like Twill/REEF/Spring © Hortonworks Inc. 2013 - Confidential Page 18
  • 19. YARN Future Work • ResourceManager High Availability – Automatic failover – Work preserving failover • Scheduler Enhancements – SLA Driven Scheduling, Low latency allocations – Multiple resource types – disk/network/GPUs/affinity • Rolling upgrades • Generic History Service • Long running services – Better support to running services like HBase – Service Discovery • More utilities/libraries for Application Developers – Failover/Checkpointing © Hortonworks Inc. 2013 - Confidential Page 19
  • 20. Key Take-Aways • YARN is a platform to build/run Multiple Distributed Applications in Hadoop • YARN is completely Backwards Compatible for existing MapReduce apps • YARN enables Fine Grained Resource Management via Generic Resource Containers. • YARN has built-in support for multi-tenancy to share cluster resources and increase cost efficiency • YARN provides a cluster operating system like abstraction for a modern data architecture © Hortonworks Inc. 2013 - Confidential Page 20
  • 21. Apache YARN The Data Operating System for Hadoop 2.0 Flexible Efficient Shared Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming Increase processing IN Hadoop on the same hardware while providing predictable performance & quality of service Provides a stable, reliable, secure foundation and shared operational services across multiple workloads Data Processing Engines Run Natively IN Hadoop BATCH MapReduce INTERACTIVE Tez ONLINE HBase STREAMING Storm, S4, … GRAPH Giraph MICROSOFT REEF SAS LASR, HPA OTHERS YARN: Cluster Resource Management HDFS2: Redundant, Reliable Storage © Hortonworks Inc. 2013 - Confidential Page 21
  • 22. Thank you! http://hortonworks.com/products/hortonworks-sandbox/ Download Sandbox: Experience Apache Hadoop Both 2.0 and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions? © Hortonworks Inc. 2013 - Confidential Page 22