Weitere ähnliche Inhalte Ähnlich wie Hortonworks Yarn Code Walk Through January 2014 (20) Mehr von Hortonworks (20) Kürzlich hochgeladen (20) Hortonworks Yarn Code Walk Through January 20142. Quick Bio – Joseph Niemiec
• Hadoop user for 2+ years
• 1 of 5 Author’s for Apache Hadoop YARN
• Originally used Hadoop for location based services
(March 2014)
– Destination Prediction
– Traffic Analysis
– Effects of weather at client locations on call center call types
• Pending Patent in Automotive/Telematics domain
• Defensive Paper on M2M Validation
• Started on analytics to be better at an MMORPG
© Hortonworks Inc. 2013
3. Agenda
• What Is YARN
• YARN Concepts & Architecture
• Code and more Code
• Q&A
© Hortonworks Inc. 2013
Page 3
4. From Batch To Anything
Single Use System
Multi Purpose Platform
Batch Apps
Batch, Interactive, Online, Streaming, …
HADOOP 1.0
HADOOP 2.0
MapReduce
(data processing)
MapReduce
Others
(data processing)
YARN
(cluster resource management
& data processing)
(cluster resource management)
HDFS
HDFS2
(redundant, reliable storage)
(redundant, reliable storage)
© Hortonworks Inc. 2013
Page 4
5. Concepts
• Application
–Application is a job submitted to the framework
–Examples
– Map Reduce Job
– MoYa Cluster
• Container
–Basic unit of allocation
–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
–Replaces the fixed map/reduce slots
© Hortonworks Inc. 2013
5
6. Architecture
• Resource Manager
–Global resource scheduler
–Hierarchical queues
• Node Manager
–Per-machine agent
–Manages the life-cycle of container
–Container resource monitoring
• Application Master
–Per-application
–Manages application scheduling and task execution
–E.g. MapReduce Application Master
© Hortonworks Inc. 2013
6
9. YARN - ApplicationMaster
• ApplicationMaster
– ApplicationSubmissionContext is the complete specification of the
ApplicationMaster, provided by Client
– ResourceManager responsible for allocating and launching
ApplicationMaster container
ApplicationSubmissionContext
resourceRequest
containerLaunchContext
appName
queue
© Hortonworks Inc. 2013
Page 9
10. YARN – Resource Allocation & Usage
• ContainerLaunchContext
– The context provided by ApplicationMaster to NodeManager to
launch the Container
– Complete specification for a process
– LocalResource used to specify container binary and
dependencies
– NodeManager responsible for downloading from shared namespace
(typically HDFS)
ContainerLaunchContext
container
commands
environment
localResources
LocalResource
uri
type
© Hortonworks Inc. 2013
Page 10
11. YARN – Resource Allocation & Usage
• ResourceRequest
priority
1
© Hortonworks Inc. 2013
<4gb, 1 core>
numContainers
1
rack0
1
*
<2gb, 1 core>
resourceName
host01
0
capability
1
*
1
Page 11
12. YARN – Resource Allocation & Usage
• Container
– The basic unit of allocation in YARN
– The result of the ResourceRequest provided by ResourceManager
to the ApplicationMaster
– A specific amount of resources (cpu, memory etc.) on a specific
machine
Container
containerId
resourceName
capability
tokens
© Hortonworks Inc. 2013
Page 12
Hinweis der Redaktion So while Hadoop 1.x had its uses this is really about turning Hadoop into the next generation platform. So what does that mean? A platform should be able to do multiple things, ergo more then just batch processing. Need Batch, Interactive, Online, and Streaming capabilities to really turn Hadoop into a Next Gen Platform.SCALES! Yahoo plans to move into a 10k node cluster Now we have a concept of deploying applications into the hadoop clusterThese applications run in containers of set resources RM takes place of JT and still has scheduling ques and such like the fair, capacity and hierarchical ques