More Related Content Similar to Hadoop YARN Services (20) More from DataWorks Summit (20) Hadoop YARN Services1. © Hortonworks Inc. 2015
Hadoop YARN Services
Xuan Gong
xgong@hortonworks.com
Steve Loughran
stevel@hortonworks.com
5. An OS can do more than run
admin-installed apps
7. © Hortonworks Inc. 2015
log
…which is important
front end
web
phones
devices
feeds log
stream
processing
front end
front end
log
database
analytics
10. © Hortonworks Inc. 2015
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
“The RM”
HDFS
YARN Node Manager
• Servers run YARN Node Managers (NM)
• NM's heartbeat to Resource Manager (RM)
• RM schedules work over cluster
• RM allocates containers to apps
• NMs start containers
• NMs report container health
Background: YARN
11. © Hortonworks Inc. 2015
Client creates App Master
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
“The RM”
HDFS
YARN Node Manager
Client
Application Master
12. © Hortonworks Inc. 2015
“AM” requests containers
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
Container
13. © Hortonworks Inc.
Short lived apps have it easy
• failure: clean restart
• logs: collect at end
• placement: by data
• security: Kerberos delegation tokens
• discovery: launcher app can track
14. © Hortonworks Inc.
Long-lived services don't
• failure: stay up
• logs: ongoing collection
• placement: availability, performance
• security: stay secure over time
• discovery: locatable by any client
16. Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexing
Container reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
YARN-896
REST
17. Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexing
Container reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
Hadoop 2.6
(Docker)
REST
18. © Hortonworks Inc. 2015
Failures
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
Container
19. © Hortonworks Inc. 2015
Failures
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
Application Master
Container
Container
container 1
container 2
lost: container 3
Failures
20. © Hortonworks Inc
Easy: enabling
// Client
amLauncher.setKeepContainersOverRestarts(true);
amLauncher.setMaxAppAttempts(8);
// Server
List<Container> liveContainers =
amRegistrationData.getContainersFromPreviousAttempts();
21. © Hortonworks Inc. 2015
Harder: rebuilding state
Node Map
Placement History
Specification
Container QueuesComponent Map
Event History
Persisted Rebuilt Transient
22. © Hortonworks Inc. 2015
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
Log Aggregation
24. © Hortonworks Inc. 2015
$ yarn rmadmin
...
-addToClusterNodeLabels [label1,label2,label3]
-removeFromClusterNodeLabels [label1,label2,label3]
-replaceLabelsOnNode [node1:port,label1,label2]
-directlyAccessNodeLabelStore
Labels
25. © Hortonworks Inc
YARN-913: Service Registry
$ slider resolve –path ~/services/org-apache-slider/storm1
{ "type" : "JSONServiceRecord",
"external" : [ {
"api" : "http://",
"addressType" : "uri",
"protocolType" : "webui",
"addresses" : [ {
"uri" : "http://nn.ex.net:4813"
} ]
}, {
"api" : "classpath:org.apache.slider.publisher.configurations",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://nn.ex.net:4813/ws/v1/slider/publisher/slider"
}]
} } ] }
26. © Hortonworks Inc. 2015
Internal and external endpoints
"internal" : [ {
"api" : "classpath:org.apache.slider.agents.secure",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "https://nn.ex.net:4813/ws/v1/slider/agents"
} ]
} ]
Internal: for an application's own use.
External: for clients, Web UIs and other apps
27. © Hortonworks Inc.
Security
• Token expiry a core Kerberos feature
• Token expiry inimical to service longevity
• Specifically: token delegation
• After 72h (default)
YARN updates the RM/AM tokens but not HDFS, ZK, ….
28. © Hortonworks Inc.
How do apps cope?
Do nothing apps can run up to 72h
–All
Keytabs apps can run forever; keytabs need to be
managed (securely)
–Slider
Client push running/scheduled client updates AM;
AM forwards to containers
–Twill
AM keytab containers ask for new tokens
–Spark via SPARK-5342
29. © Hortonworks Inc.
…so you can now:
write long lived apps
…with failure resilience
…and centralised log viewing
…and labelled/isolated placement
…in secure clusters
30. Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexing
Container reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
TODO
RESTREST
31. © Hortonworks Inc 2015
Questions?
For some code, see
http://slider.incubator.apache.org/
http://hadoop.apache.org