SlideShare a Scribd company logo
1 of 51
YARN Meet Up Sep 2013
@LinkedIn
By (lots of speakers)
(Editable) Agenda
• Hadoop 2.0 beta – Vinod Kumar Vavilapalli
– YARN APIs stability
– Existing applications

• Application History Server – Mayank Bansal
• RM reliability – Bikas Saha, Jian He, Karthik Kambatla
– RM restartability
– RM fail-over

•
•
•
•
•
•

Apache Tez – Hitesh Shah, Siddharth Seth
Apache Samza
Apache Giraph
Apache Helix
gohadoop: go YARN application - Arun
Llama - Alejandro
Hadoop 2.0 beta
Vinod Kumar Vavilapalli
Hadoop 2.0 beta
•
•
•
•

Stable YARN APIs
MR binary compatibility
Testing with the whole stack
Ready for prime-time!
YARN API stability
•
•
•
•
•
•
•
•

YARN-386
Broke APIs for one last time
Hopefully 
Exceptions/method-names
Security: Tokens used irrespective of kerberos
Read-only IDs, factories for creating records
Protocols renamed!
Client libraries
Compatibility for existing applications
•
•
•
•

MAPREDUCE-5108
Old mapred APIs binary compatible
New mapreduce APIs source compatible
Pig, Hive, Oozie etc work with latest versions.
No need to rewrite your scripts.
Application History Server
Mayank Bansal
Contributions

Mayank Bansal
Zhije Shen
Devaraj K
Vinod Kumar Vavilapalli

YARN-321
Why we need AHS ?
•

Job History Server is MR specific

•

Jobs which are not MR

•

RM Restart

•

Hard coded Limits for number of jobs

•

Longer running jobs
AHS
•

Different process or Embedded in RM

•

Contains generic application Data
•
•
•

•

Application
Application Attempts
Container

Client Interfaces
•
•
•

WEB UI
Client Interface
Web Services
AHS History Store
•

Pluggable History Store
•

Storage Format is PB
•
•

•

Backward Compatible
Much easier to evolve the storage

HDFS implementation
AHS
Store Write Interface

Store

RM

Store
Reading
Interface

App Finished

WEB APP

WS
AHS

RPC
Remaining Work
•

Security

•

Command Line Interface
Next Phase

Application Specific Data ???

Long running services
DEMO
RM reliability
Bikas Saha
Jian He
Karthik Kambatla
RM reliability
• Restartability
• High availabilty
RM restartability
Jian He
Bikas Saha
Design and Work Plan
• YARN-128 – RM Restart
– Creates framework to store and load state information.
Forms the basis of failover and HA. Work close to
completion and being actively tested.

• YARN-149 – RM HA
– Adds HA service to RM in order to failover between
instances. Work in active progress.

• YARN-556 – Work preserving RM Restart
– Support loss-less recovery of cluster state when RM
restarts or fails over
– Design proposal up

• All the work is being done in a carefully planned
manner directly on trunk. Code is always stable and
ready.
RM Restart (YARN-128)
•
•
•
•

Current state of the impl
Internal details
Impact on applications/frameworks
How to use
RM Restart
• Supports ZooKeeper, HDFS, and local
FileSystem as the underlying store.
• ClientRMProxy
– all Clients (NM, AM, clients) of RM have the same
retry behavior while RM is down.

• RM restart is working in secure environment
now!
Internal details
RMStateStore
• Two types of State Info:
– Application related state info: asynchronously
• ApplicationState
– ApplicationSubmissionContext ( AM ContainerLaunchContext,
Queue, etc.)

• ApplicationAttemptState
– AM container, AMRMToken, ClientTokenMasterKey, etc.

– RMDelegationTokenSecretManager State(not
application specific) : synchronously
• RMDelegationToken
• RMDelegationToken MasterKey
• RMDelegationToken Sequence Number
RM Recovery Workflow
• Save the app on app submission
– User Provided credentials (HDFSDelegationToken)

• Save the attempt on AM attempt launch
– AMRMToken, ClientToken

• RMDelegationTokenSecretManager
– Save the token and sequence number on token
generation
– Save master key when it rolls

• RM crashes….
What happens after RM restarts?
• Instruct the old AM to shutdown
• Load the ApplicationSubmissionContext
– Submit the application

• Load the earlier attempts
– Loads the attempt credentials (AMRMToken,
ClientToken)

• Launch a new attempt
Impact on applications/frameworks
Consistency between Downstream
consumers of AM and YARN
• AM should notify its consumers that the job is
done only after YARN reports it’s done
– FinishApplicationMasterResponse.getIsUnregister
ed()
– User is expected to retry this API until it becomes
true.
– Similarly, kill-application (fix in progress)
For MR AM
• Races:
– JobClient: AM crashes after JobClient sees
FINISHED but before RM removes the app when
app finishes
• Bugs: relaunch FINISHED application(succeeded, failed,
killed)

– HistoryServer: History files flushed before RM
removes the app when app finishes
How to use?
How to use: 3 steps
• 1. Enable RM restart
– yarn.resourcemanager.recovery.enabled

• 2. Choose the underlying store you want (HDFS, ZooKeeper,
local FileSystem)
– yarn.resourcemanager.store.class
– FileSystemRMStateStore / ZKRMStateStore

• 3. Configure the address of the store
– yarn.resourcemanager.fs.state-store.uri
– hdfs://localhost:9000/rmstore
YARN – Fail over
Karthik Kambatla
RM HA (YARN-149)
● Architecture
● Failover / Admin
● Fencing
● Config changes
● FailoverProxyProvider
Architecture
● Active / Standby
○

Standby is powered up, but doesn’t have any
state

● Restructure RM services (YARN-1098)
○

Always On services

○

Active Services (e.g. Client <-> RM, AM <-> RM)

● RMHAService (YARN-1027)
Failover / Admin
● CLI: yarn rmhaadmin
● Manual failover
● Automatic failover (YARN-1177)
○

Use ZKFC

○

Start it as an RM service instead of a separate
daemon.

○

Re-visit and strip out unnecessary parts.
Fencing (YARN-1222)
● Implicit fencing through ZK RM StateStore
● ACL-based fencing on store.load() during
transition to active
○

Shared read-write-admin access to the store

○

Claim exclusive create-delete access

○

All store operations create-delete a fencing node

○

The other RM can’t write to the store anymore
Config changes (YARN-1232, YARN-986)
1. <name>yarn.resourcemanager.address</name>
<value>clusterid</value>
2. <name>yarn.resourcemanager.ha.nodes.clusterid</name>
<value>rm1,rm2</value>
3. <name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
4. <name>yarn.resourcemanager.address.clusterid.rm1</name>
<value>host1:23140</value>
5. <name>yarn.resourcemanager.address.clusterid.rm2</name>
<value>host2:23140</value>
FailoverProxyProvider
● ConfiguredFailoverProxyProvider (YARN1028)
○

Use alternate RMs from the config during retry

○

ClientRMProxy
■

○

addresses client-based RPC addresses

ServerRMProxy
■

addresses server-based RPC addresses
Apache TEZ
Hitesh Shah
What is Tez?
• A data processing framework that can execute a complex DAG of
tasks.

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 44
Tez DAG and Tasks

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 45
TEZ as a Yarn Application
• No deployment of TEZ jars required on all nodes in the Cluster
– Everything is pushed from either the client or from HDFS to the Cluster using YARN’s
LocalResource functionality
– Ability to run multiple different versions

• TEZ Sessions
– A single AM can handle multiple DAGs (“jobs”)
– Amortize and hide platform latency

• Exciting new features
– Support for complex DAGs – broadcast joins (Hive map joins)
– Support for lower latency – container reuse and shared objects
– Support for dynamic concurrency control – determine reduce parallelism at runtime

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 46
TEZ: Community
• Early adopters and contributors welcome
– Adopters to drive more scenarios. Contributors to make them happen.
– Hive and Pig communities are on-board and making great progress - HIVE-4660 and
PIG-3446 for Hive-on-Tez and Pig-on-Tez

• Meetup group – Please sign up to know more
– http://www.meetup.com/Apache-Tez-User-Group

• Useful Links:
– Website: http://tez.incubator.apache.org/
– Code: http://git-wip-us.apache.org/repos/asf/incubator-tez.git
– JIRA: https://issues.apache.org/jira/browse/TEZ
– Mailing Lists:
– dev-subscribe@tez.incubator.apache.org
– user-subscribe@tez.incubator.apache.org

https://issues.apache.org/jira/browse/TEZ-65

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 47
© Hortonworks Inc. 2011
Apache Tez
Apache Samza
Apache Giraph
Apache Helix
YARN usage @LinkedIn
YARN Go demo
https://github.com/hortonworks/gohadoop
Arun C Murthy
Llama
Alejandro Abdelnur

More Related Content

What's hot

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 

What's hot (20)

YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 

Similar to Field Notes: YARN Meetup at LinkedIn

Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
DataWorks Summit
 

Similar to Field Notes: YARN Meetup at LinkedIn (20)

Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
YARN
YARNYARN
YARN
 

More from Hortonworks

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Field Notes: YARN Meetup at LinkedIn

  • 1. YARN Meet Up Sep 2013 @LinkedIn By (lots of speakers)
  • 2. (Editable) Agenda • Hadoop 2.0 beta – Vinod Kumar Vavilapalli – YARN APIs stability – Existing applications • Application History Server – Mayank Bansal • RM reliability – Bikas Saha, Jian He, Karthik Kambatla – RM restartability – RM fail-over • • • • • • Apache Tez – Hitesh Shah, Siddharth Seth Apache Samza Apache Giraph Apache Helix gohadoop: go YARN application - Arun Llama - Alejandro
  • 3. Hadoop 2.0 beta Vinod Kumar Vavilapalli
  • 4. Hadoop 2.0 beta • • • • Stable YARN APIs MR binary compatibility Testing with the whole stack Ready for prime-time!
  • 5. YARN API stability • • • • • • • • YARN-386 Broke APIs for one last time Hopefully  Exceptions/method-names Security: Tokens used irrespective of kerberos Read-only IDs, factories for creating records Protocols renamed! Client libraries
  • 6. Compatibility for existing applications • • • • MAPREDUCE-5108 Old mapred APIs binary compatible New mapreduce APIs source compatible Pig, Hive, Oozie etc work with latest versions. No need to rewrite your scripts.
  • 7.
  • 9. Contributions Mayank Bansal Zhije Shen Devaraj K Vinod Kumar Vavilapalli YARN-321
  • 10. Why we need AHS ? • Job History Server is MR specific • Jobs which are not MR • RM Restart • Hard coded Limits for number of jobs • Longer running jobs
  • 11. AHS • Different process or Embedded in RM • Contains generic application Data • • • • Application Application Attempts Container Client Interfaces • • • WEB UI Client Interface Web Services
  • 12. AHS History Store • Pluggable History Store • Storage Format is PB • • • Backward Compatible Much easier to evolve the storage HDFS implementation
  • 15. Next Phase Application Specific Data ??? Long running services
  • 16. DEMO
  • 17.
  • 18. RM reliability Bikas Saha Jian He Karthik Kambatla
  • 20.
  • 22. Design and Work Plan • YARN-128 – RM Restart – Creates framework to store and load state information. Forms the basis of failover and HA. Work close to completion and being actively tested. • YARN-149 – RM HA – Adds HA service to RM in order to failover between instances. Work in active progress. • YARN-556 – Work preserving RM Restart – Support loss-less recovery of cluster state when RM restarts or fails over – Design proposal up • All the work is being done in a carefully planned manner directly on trunk. Code is always stable and ready.
  • 23. RM Restart (YARN-128) • • • • Current state of the impl Internal details Impact on applications/frameworks How to use
  • 24. RM Restart • Supports ZooKeeper, HDFS, and local FileSystem as the underlying store. • ClientRMProxy – all Clients (NM, AM, clients) of RM have the same retry behavior while RM is down. • RM restart is working in secure environment now!
  • 26. RMStateStore • Two types of State Info: – Application related state info: asynchronously • ApplicationState – ApplicationSubmissionContext ( AM ContainerLaunchContext, Queue, etc.) • ApplicationAttemptState – AM container, AMRMToken, ClientTokenMasterKey, etc. – RMDelegationTokenSecretManager State(not application specific) : synchronously • RMDelegationToken • RMDelegationToken MasterKey • RMDelegationToken Sequence Number
  • 27. RM Recovery Workflow • Save the app on app submission – User Provided credentials (HDFSDelegationToken) • Save the attempt on AM attempt launch – AMRMToken, ClientToken • RMDelegationTokenSecretManager – Save the token and sequence number on token generation – Save master key when it rolls • RM crashes….
  • 28. What happens after RM restarts? • Instruct the old AM to shutdown • Load the ApplicationSubmissionContext – Submit the application • Load the earlier attempts – Loads the attempt credentials (AMRMToken, ClientToken) • Launch a new attempt
  • 30. Consistency between Downstream consumers of AM and YARN • AM should notify its consumers that the job is done only after YARN reports it’s done – FinishApplicationMasterResponse.getIsUnregister ed() – User is expected to retry this API until it becomes true. – Similarly, kill-application (fix in progress)
  • 31. For MR AM • Races: – JobClient: AM crashes after JobClient sees FINISHED but before RM removes the app when app finishes • Bugs: relaunch FINISHED application(succeeded, failed, killed) – HistoryServer: History files flushed before RM removes the app when app finishes
  • 33. How to use: 3 steps • 1. Enable RM restart – yarn.resourcemanager.recovery.enabled • 2. Choose the underlying store you want (HDFS, ZooKeeper, local FileSystem) – yarn.resourcemanager.store.class – FileSystemRMStateStore / ZKRMStateStore • 3. Configure the address of the store – yarn.resourcemanager.fs.state-store.uri – hdfs://localhost:9000/rmstore
  • 34.
  • 35. YARN – Fail over Karthik Kambatla
  • 36. RM HA (YARN-149) ● Architecture ● Failover / Admin ● Fencing ● Config changes ● FailoverProxyProvider
  • 37. Architecture ● Active / Standby ○ Standby is powered up, but doesn’t have any state ● Restructure RM services (YARN-1098) ○ Always On services ○ Active Services (e.g. Client <-> RM, AM <-> RM) ● RMHAService (YARN-1027)
  • 38. Failover / Admin ● CLI: yarn rmhaadmin ● Manual failover ● Automatic failover (YARN-1177) ○ Use ZKFC ○ Start it as an RM service instead of a separate daemon. ○ Re-visit and strip out unnecessary parts.
  • 39. Fencing (YARN-1222) ● Implicit fencing through ZK RM StateStore ● ACL-based fencing on store.load() during transition to active ○ Shared read-write-admin access to the store ○ Claim exclusive create-delete access ○ All store operations create-delete a fencing node ○ The other RM can’t write to the store anymore
  • 40. Config changes (YARN-1232, YARN-986) 1. <name>yarn.resourcemanager.address</name> <value>clusterid</value> 2. <name>yarn.resourcemanager.ha.nodes.clusterid</name> <value>rm1,rm2</value> 3. <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> 4. <name>yarn.resourcemanager.address.clusterid.rm1</name> <value>host1:23140</value> 5. <name>yarn.resourcemanager.address.clusterid.rm2</name> <value>host2:23140</value>
  • 41. FailoverProxyProvider ● ConfiguredFailoverProxyProvider (YARN1028) ○ Use alternate RMs from the config during retry ○ ClientRMProxy ■ ○ addresses client-based RPC addresses ServerRMProxy ■ addresses server-based RPC addresses
  • 42.
  • 44. What is Tez? • A data processing framework that can execute a complex DAG of tasks. Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 44
  • 45. Tez DAG and Tasks Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 45
  • 46. TEZ as a Yarn Application • No deployment of TEZ jars required on all nodes in the Cluster – Everything is pushed from either the client or from HDFS to the Cluster using YARN’s LocalResource functionality – Ability to run multiple different versions • TEZ Sessions – A single AM can handle multiple DAGs (“jobs”) – Amortize and hide platform latency • Exciting new features – Support for complex DAGs – broadcast joins (Hive map joins) – Support for lower latency – container reuse and shared objects – Support for dynamic concurrency control – determine reduce parallelism at runtime Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 46
  • 47. TEZ: Community • Early adopters and contributors welcome – Adopters to drive more scenarios. Contributors to make them happen. – Hive and Pig communities are on-board and making great progress - HIVE-4660 and PIG-3446 for Hive-on-Tez and Pig-on-Tez • Meetup group – Please sign up to know more – http://www.meetup.com/Apache-Tez-User-Group • Useful Links: – Website: http://tez.incubator.apache.org/ – Code: http://git-wip-us.apache.org/repos/asf/incubator-tez.git – JIRA: https://issues.apache.org/jira/browse/TEZ – Mailing Lists: – dev-subscribe@tez.incubator.apache.org – user-subscribe@tez.incubator.apache.org https://issues.apache.org/jira/browse/TEZ-65 Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 47
  • 49. Apache Tez Apache Samza Apache Giraph Apache Helix YARN usage @LinkedIn