SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Developing YARN Native Applications
Arun Murthy – Architect / Founder
Bob Page – VP Partner Products
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Topics
Hadoop 2 and YARN: Beyond Batch
YARN: The Hadoop Resource Manager
• YARN Concepts and Terminology
• The YARN APIs
• A Simple YARN application
• The Application Timeline Server
Next Steps
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop 2 and YARN: Beyond Batch
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop 2.0: From Batch-only to Multi-Workload
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
(data processing)
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Key Driver Of Hadoop Adoption: Enterprise Data Lake
Flexible
Enables other purpose-built data
processing models beyond
MapReduce (batch), such as
interactive and streaming
Efficient
Double processing IN Hadoop on
the same hardware while providing
predictable performance & quality
of service
Shared
Provides a stable, reliable,
secure foundation and shared
operational services across
multiple workloads
Data Processing Engines Run Natively IN Hadoop
BATCH
MapReduce
INTERACTIVE
Tez
STREAMING
Storm
IN-MEMORY
Spark
GRAPH
Giraph
ONLINE
HBase, Accumulo
OTHERS
HDFS: Redundant, Reliable Storage
YARN: Cluster Resource Management
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
5 Key Benefits of YARN
1. Scale
2. New Programming Models & Services
3. Improved Cluster Utilization
4. Agility
5. Beyond Java
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Platform Benefits
Deployment
YARN provides a seamless vehicle to deploy your software to an enterprise Hadoop cluster
Fault Tolerance
YARN ‘handles’ (detects, notifies, and provides default actions) for HW, OS, JVM failure
tolerance
YARN provides plugins for the app to define failure behavior
Scheduling (incorporating Data Locality)
YARN utilizes HDFS to schedule app processing where the data lives
YARN ensures that your apps finish in the SLA expected by your customers
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Brief History of YARN
Originally conceived & architected at Yahoo!
Arun Murthy created the original JIRA in 2008 and led the PMC
The team at Hortonworks has been working on YARN for 4 years
90% of code from Hortonworks & Yahoo!
YARN battle-tested at scale with Yahoo!
In production on 32,000+ nodes
YARN Released October 2013 with Apache Hadoop 2
Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Development Framework
YARN : Data Operating System
°1 ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°°
° ° ° ° ° ° °
° ° ° ° ° ° N
HDFS
(Hadoop Distributed File System)
System
Batch
MapReduce
Interactive
Tez
Engine Real-Time
Slider
Direct
ISV
Apps
Scripting
Pig
SQL
Hive
Cascading
Java
Scala
NoSQL
HBase
Accumulo
Stream
Storm
API
ISV
Apps
ISV
Aps
Applications
Others
Spark
ISV Apps
ISV
Apps
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Concepts
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apps on YARN: Categories
Type Definition Examples
Framework / Engine Provides platform capabilities to
enable data services and
applications
Twill, Reef, Tez, MapReduce, Spark
Service An application that runs
continuously
Storm, HBase, Memcached, etc
Job A batch/iterative data processing
job that runs on a Service or a
Framework
- XML Parsing MR job
- Mahout K-means algorithm
YARN App A temporal job or a service
submitted to YARN
- HBase Cluster (service)
- MapReduce job
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Concepts: Container
Basic unit of allocation
Fine-grained resource allocation
memory, CPU, disk, network, GPU, etc.
• container_0 = 2GB, 1CPU
• container_1 = 1GB, 6 CPU
Replaces the fixed map/reduce
slots from Hadoop 1
Capability
Memory, CPU
Container Request
Capability, Host, Rack, Priority, relaxLocality
Container Launch Context
LocalResources - Resources needed to
execute container application
Environment variables - Example: classpath
Command to execute
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Terminology
ResourceManager (RM) – central
agent
–Allocates & manages cluster resources
–Hierarchical queues
NodeManager (NM) – per-node agent
–Manages, monitors and enforces node
resource allocations
–Manages lifecycle of containers
User Application
ApplicationMaster (AM)
 Manages application lifecycle and task
scheduling
Container
 Executes application logic
Client
 Submits the application
Launching the app
1. Client requests ResourceManager to
launch ApplicationMaster Container
2. ApplicationMaster requests NodeManager
to launch Application Containers
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Process Flow - Walkthrough
NodeManager NodeManager NodeManager NodeManager
Container 1.1
Container 2.4
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.2
Container 1.3
AM 1
Container 2.2
Container 2.1
Container 2.3
AM2
Client2
ResourceManager
Scheduler
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The YARN APIs
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Node ManagerNode Manager
APIs Needed
Only three protocols
Client to ResourceManager
• Application submission
ApplicationMaster to ResourceManager
• Container allocation
ApplicationMaster to NodeManager
• Container launch
Use client libraries for all 3 actions
Package org.apache.hadoop.yarn.client.api
provides both synchronous and asynchronous libraries
Client
Resource
Manager
Application
Master
Node Manager
YarnClient
Application Client
Protocol
AMRMClient
NMClient
Application Master
Protocol
App
Container
Container Management
Protocol
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – Implementation Outline
1. Write a Client to submit the application
2. Write an ApplicationMaster (well, copy & paste)
“DistributedShell is the new WordCount”
3. Get containers, run whatever you want!
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – Implementing Applications
What else do I need to know?
Resource Allocation & Usage
• ResourceRequest
• Container
• ContainerLaunchContext & LocalResource
ApplicationMaster
• ApplicationId
• ApplicationAttemptId
• ApplicationSubmissionContext
Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – Resource Allocation & Usage
ResourceRequest
Fine-grained resource ask to the ResourceManager
Ask for a specific amount of resources (memory, CPU etc.) on a specific machine or rack
Use special value of * for resource name for any machine
ResourceRequest
priority
resourceName
capability
numContainers
Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – Resource Allocation & Usage
Container
The basic unit of allocation in YARN
The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster
A specific amount of resources (CPU, memory etc.) on a specific machine
Container
containerId
resourceName
capability
tokens
Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – Resource Allocation & Usage
ContainerLaunchContext & LocalResource
The context provided by ApplicationMaster to NodeManager to launch the Container
Complete specification for a process
LocalResource is used to specify container binary and dependencies
• NodeManager is responsible for downloading from shared namespace (typically HDFS)
ContainerLaunchContext
container
commands
environment
localResources LocalResource
uri
type
Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The ApplicationMaster
The per-application controller aka container_0
The parent for all containers of the application
ApplicationMaster negotiates its containers from ResourceManager
ApplicationMaster container is child of ResourceManager
Think init process in Unix
RM restarts the ApplicationMaster attempt if required (unique ApplicationAttemptId)
Code for application is submitted along with Application itself
Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
ApplicationSubmissionContext
ApplicationSubmissionContext is the complete specification of the
ApplicationMaster
Provided by the Client
ResourceManager responsible for allocating and launching the ApplicationMaster container
ApplicationSubmissionContext
resourceRequest
containerLaunchContext
appName
queue
Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Application API - Overview
hadoop-yarn-client module
YarnClient is submission client API
Both synchronous & asynchronous APIs for resource allocation and
container start/stop
Synchronous: AMRMClient & AMNMClient
Asynchronous: AMRMClientAsync & AMNMClientAsync
Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Application API – YarnClient
createApplication to create application
submitApplication to start application
Application developer provides ApplicationSubmissionContext
APIs to get other information from ResourceManager
getAllQueues
getApplications
getNodeReports
APIs to manipulate submitted application e.g. killApplication
Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Application API – The Client
NodeManager NodeManager NodeManager NodeManager
Container 1.1
Container 2.4
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.2
Container 1.3
AM 1
Container 2.2
Container 2.1
Container 2.3
AM2
Client2
New Application Request:
YarnClient.createApplication
Submit Application:
YarnClient.submitApplication
1
2
ResourceManager
Scheduler
Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
AppMaster-ResourceManager API
AMRMClient - Synchronous API
registerApplicationMaster
unregisterApplicationMaster
Resource negotiation
addContainerRequest
removeContainerRequest
releaseAssignedContainer
Main API – allocate
Helper APIs for cluster information
getAvailableResources
getClusterNodeCount
AMRMClientAsync – Asynchronous
Extension of AMRMClient to provide
asynchronous CallbackHandler
Callback interaction model with
ResourceManager
onContainersAllocated
onContainersCompleted
onNodesUpdated
onError
onShutdownRequest
Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
AppMaster-ResourceManager flow
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager
AM
registerApplicationMaster
1
4
AMRMClient.allocate
Container
2
3
unregisterApplicationMaster
ResourceManager
Scheduler
NodeManager NodeManager NodeManager NodeManager
Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
AppMaster-NodeManager API
For AM to launch/stop containers at NodeManager
AMNMClient - Synchronous API
Simple (trivial) APIs
• startContainer
• stopContainer
• getContainerStatus
AMNMClientAsync – Asynchronous
Simple (trivial) APIs
startContainerAsync
stopContainerAsync
getContainerStatusAsync
Callback interaction model with
NodeManager
onContainerStarted
onContainerStopped
onStartContainerError
onContainerStatusReceived
Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Application API - Development
Un-Managed Mode for ApplicationMaster
Run the ApplicationMaster on your development machine rather than in-cluster
• No submission client needed
Use hadoop-yarn-applications-unmanaged-am-launcher
Easier to step through debugger, browse logs etc.
$ bin/hadoop jar hadoop-yarn-applications-unmanaged-am-launcher.jar 
Client 
–jar my-application-master.jar 
–cmd ‘java MyApplicationMaster <args>’
Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Simple YARN Application
Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Simple YARN Application
Simplest example of a YARN application – get n containers, and run a specific Unix command
on each. Minimal error handling, etc.
Control Flow
1. User submits application to the Resource Manager
• Client provides ApplicationSubmissionContext to the Resource Manager
2. App Master negotiates with Resource Manager for n containers
3. App Master launches containers with the user-specified command as
ContainerLaunchContext.commands
Code: https://github.com/hortonworks/simple-yarn-app
Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – Client
Command to launch
ApplicationMaster process
Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – Client
Resources required for
ApplicationMaster
container
ApplicationSubmissionContext
for
ApplicationMaster
Submit application to
ResourceManager
Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – AppMaster
Steps:
1. AMRMClient.registerApplication
2. Negotiate containers from ResourceManager by providing ContainerRequest to
AMRMClient.addContainerRequest
3. Take the resultant Container returned via subsequent call to AMRMClient.allocate, build
ContainerLaunchContext with Container and commands, then launch them using
AMNMClient.launchContainer
– Use LocalResources to specify software/configuration dependencies for each worker container
4. Wait till done… AllocateResponse.getCompletedContainersStatuses from subsequent calls
to AMRMClient.allocate
5. AMRMClient.unregisterApplication
Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – AppMaster
Initialize clients
to ResourceManager
and NodeManagers
Register with
ResourceManager
Initialize clients to
ResourceManager
and NodeManagers
Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – AppMaster
Setup requirements for
worker containers
Make resource
requests to
ResourceManager
Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – AppMaster
Get containers from
ResourceManager
Launch containers
on NodeManagers
Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Simple YARN Application – AppMaster
Wait for containers to
complete successfully
Un-register with
ResourceManager
Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Graduating from simple-yarn-app
DistributedShell. Same functionality but less simple
e.g. error checking, use of timeline server
For a complex YARN app, see Tez
Pre-warmed containers, sessions, etc.
Look at MapReduce for even more excitement
Data locality, fault tolerance, checkpoint to HDFS, security, isolation, etc
Intra-application priorities (maps vs reduces) need complex feedback from ResourceManager
(all at apache.org)
Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Application Timeline Server
Page42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Application Timeline Server
Maintains historical state & provides metrics visibility for YARN apps
Similar to MapReduce Job History Server
Information can be queried via REST APIs
ATS in HDP 2.1 is considered a Tech Preview
Generic information
• queue name
• user information
• information about application attempts
• a list of Containers that were run under
each application attempt
• information about each Container
Per-framework/application info
Developers can publish information to the
Timeline Server via the TimelineClient (from
within a client), the ApplicationMaster, or the
application's Containers.
Page43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Application Timeline Server
App Timeline Server
AMBARI
Custom App
Monitoring
Client
Page44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next Steps
Page45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
hortonworks.com/get-started/YARN
Setup HDP 2.1 environment
Leverage Sandbox
Review Sample Code & Execute Simple YARN Application
https://github.com/hortonworks/simple-yarn-app
Graduate to more complex code examples
BUILD FLEXIBLE, SCALABLE, RESILIENT & POWERFUL APPLICATIONS TO RUN IN HADOOP
Page46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks YARN Resources
Hortonworks Web Site
hortonworks.com/hadoop/yarn
Includes links to blog posts
YARN Forum
Community of Hadoop YARN developers – collaboration and Q&A
hortonworks.com/community/forums/forum/yarn
YARN Office Hours
Dial in and chat with YARN experts
Next Office Hour: Thursday August 14 @ 10-11am PDT. Register:
https://hortonworks.webex.com/hortonworks/onstage/g.php?t=a&d=628190636
Page47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
And from Hortonworks University
Hortonworks Course: Developing Custom YARN Applications
Format: Online
Duration: 2 Days
When: Aug 18th & 19th (Mon & Tues)
Cost: No Charge to Hortonworks Technical Partners
Space: Very Limited
Interested? Please contact lsensmeier@hortonworks.com
Page48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Stay in Touch!
Join us for the full series of YARN development webinars:
YARN Native July 24 @ 9am PT (recording link)
Slider August 7 @ 9am PT (registration link)
Tez August 21 @ 9am PT (registration link)
Additional webinar topics are being added – watch the blog or visit
Hortonworks.com/webinars
http://hortonworks.com/hadoop/yarn

Más contenido relacionado

Was ist angesagt?

HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
Apache ZooKeeper 소개
Apache ZooKeeper 소개Apache ZooKeeper 소개
Apache ZooKeeper 소개중선 곽
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Apache ZooKeeper 로
 분산 서버 만들기
Apache ZooKeeper 로
 분산 서버 만들기Apache ZooKeeper 로
 분산 서버 만들기
Apache ZooKeeper 로
 분산 서버 만들기iFunFactory Inc.
 
Nfs version 4 protocol presentation
Nfs version 4 protocol presentationNfs version 4 protocol presentation
Nfs version 4 protocol presentationAbu Osama
 
Learn docker in 90 minutes
Learn docker in 90 minutesLearn docker in 90 minutes
Learn docker in 90 minutesLarry Cai
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...HostedbyConfluent
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육Ji-Woong Choi
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 

Was ist angesagt? (20)

HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Apache ZooKeeper 소개
Apache ZooKeeper 소개Apache ZooKeeper 소개
Apache ZooKeeper 소개
 
Yarn.ppt
Yarn.pptYarn.ppt
Yarn.ppt
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Caching Strategies
Caching StrategiesCaching Strategies
Caching Strategies
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Apache ZooKeeper 로
 분산 서버 만들기
Apache ZooKeeper 로
 분산 서버 만들기Apache ZooKeeper 로
 분산 서버 만들기
Apache ZooKeeper 로
 분산 서버 만들기
 
Docker internals
Docker internalsDocker internals
Docker internals
 
Nfs version 4 protocol presentation
Nfs version 4 protocol presentationNfs version 4 protocol presentation
Nfs version 4 protocol presentation
 
Learn docker in 90 minutes
Learn docker in 90 minutesLearn docker in 90 minutes
Learn docker in 90 minutes
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육
[오픈소스컨설팅] Ansible을 활용한 운영 자동화 교육
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 

Andere mochten auch

Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillTerence Yim
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnDataWorks Summit
 
Apache REEF - stdlib for big data
Apache REEF - stdlib for big dataApache REEF - stdlib for big data
Apache REEF - stdlib for big dataSergiy Matusevych
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Dynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNDynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNTsuyoshi OZAWA
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 

Andere mochten auch (20)

Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
 
Apache REEF - stdlib for big data
Apache REEF - stdlib for big dataApache REEF - stdlib for big data
Apache REEF - stdlib for big data
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Dynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNDynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARN
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 

Ähnlich wie Developing YARN Applications - Integrating natively to YARN July 24 2014

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012hitesh1892
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPOmkar Joshi
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 

Ähnlich wie Developing YARN Applications - Integrating natively to YARN July 24 2014 (20)

Yarn
YarnYarn
Yarn
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 

Último (20)

EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 

Developing YARN Applications - Integrating natively to YARN July 24 2014

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Developing YARN Native Applications Arun Murthy – Architect / Founder Bob Page – VP Partner Products
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Topics Hadoop 2 and YARN: Beyond Batch YARN: The Hadoop Resource Manager • YARN Concepts and Terminology • The YARN APIs • A Simple YARN application • The Application Timeline Server Next Steps
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop 2 and YARN: Beyond Batch
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop 2.0: From Batch-only to Multi-Workload HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) Others (data processing) HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, …
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Key Driver Of Hadoop Adoption: Enterprise Data Lake Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads Data Processing Engines Run Natively IN Hadoop BATCH MapReduce INTERACTIVE Tez STREAMING Storm IN-MEMORY Spark GRAPH Giraph ONLINE HBase, Accumulo OTHERS HDFS: Redundant, Reliable Storage YARN: Cluster Resource Management
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 5 Key Benefits of YARN 1. Scale 2. New Programming Models & Services 3. Improved Cluster Utilization 4. Agility 5. Beyond Java
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Platform Benefits Deployment YARN provides a seamless vehicle to deploy your software to an enterprise Hadoop cluster Fault Tolerance YARN ‘handles’ (detects, notifies, and provides default actions) for HW, OS, JVM failure tolerance YARN provides plugins for the app to define failure behavior Scheduling (incorporating Data Locality) YARN utilizes HDFS to schedule app processing where the data lives YARN ensures that your apps finish in the SLA expected by your customers
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A Brief History of YARN Originally conceived & architected at Yahoo! Arun Murthy created the original JIRA in 2008 and led the PMC The team at Hortonworks has been working on YARN for 4 years 90% of code from Hortonworks & Yahoo! YARN battle-tested at scale with Yahoo! In production on 32,000+ nodes YARN Released October 2013 with Apache Hadoop 2
  • 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Development Framework YARN : Data Operating System °1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) System Batch MapReduce Interactive Tez Engine Real-Time Slider Direct ISV Apps Scripting Pig SQL Hive Cascading Java Scala NoSQL HBase Accumulo Stream Storm API ISV Apps ISV Aps Applications Others Spark ISV Apps ISV Apps
  • 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Concepts
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apps on YARN: Categories Type Definition Examples Framework / Engine Provides platform capabilities to enable data services and applications Twill, Reef, Tez, MapReduce, Spark Service An application that runs continuously Storm, HBase, Memcached, etc Job A batch/iterative data processing job that runs on a Service or a Framework - XML Parsing MR job - Mahout K-means algorithm YARN App A temporal job or a service submitted to YARN - HBase Cluster (service) - MapReduce job
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Concepts: Container Basic unit of allocation Fine-grained resource allocation memory, CPU, disk, network, GPU, etc. • container_0 = 2GB, 1CPU • container_1 = 1GB, 6 CPU Replaces the fixed map/reduce slots from Hadoop 1 Capability Memory, CPU Container Request Capability, Host, Rack, Priority, relaxLocality Container Launch Context LocalResources - Resources needed to execute container application Environment variables - Example: classpath Command to execute
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Terminology ResourceManager (RM) – central agent –Allocates & manages cluster resources –Hierarchical queues NodeManager (NM) – per-node agent –Manages, monitors and enforces node resource allocations –Manages lifecycle of containers User Application ApplicationMaster (AM)  Manages application lifecycle and task scheduling Container  Executes application logic Client  Submits the application Launching the app 1. Client requests ResourceManager to launch ApplicationMaster Container 2. ApplicationMaster requests NodeManager to launch Application Containers
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Process Flow - Walkthrough NodeManager NodeManager NodeManager NodeManager Container 1.1 Container 2.4 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.2 Container 1.3 AM 1 Container 2.2 Container 2.1 Container 2.3 AM2 Client2 ResourceManager Scheduler
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The YARN APIs
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Node ManagerNode Manager APIs Needed Only three protocols Client to ResourceManager • Application submission ApplicationMaster to ResourceManager • Container allocation ApplicationMaster to NodeManager • Container launch Use client libraries for all 3 actions Package org.apache.hadoop.yarn.client.api provides both synchronous and asynchronous libraries Client Resource Manager Application Master Node Manager YarnClient Application Client Protocol AMRMClient NMClient Application Master Protocol App Container Container Management Protocol
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – Implementation Outline 1. Write a Client to submit the application 2. Write an ApplicationMaster (well, copy & paste) “DistributedShell is the new WordCount” 3. Get containers, run whatever you want!
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – Implementing Applications What else do I need to know? Resource Allocation & Usage • ResourceRequest • Container • ContainerLaunchContext & LocalResource ApplicationMaster • ApplicationId • ApplicationAttemptId • ApplicationSubmissionContext
  • 19. Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – Resource Allocation & Usage ResourceRequest Fine-grained resource ask to the ResourceManager Ask for a specific amount of resources (memory, CPU etc.) on a specific machine or rack Use special value of * for resource name for any machine ResourceRequest priority resourceName capability numContainers
  • 20. Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – Resource Allocation & Usage Container The basic unit of allocation in YARN The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster A specific amount of resources (CPU, memory etc.) on a specific machine Container containerId resourceName capability tokens
  • 21. Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – Resource Allocation & Usage ContainerLaunchContext & LocalResource The context provided by ApplicationMaster to NodeManager to launch the Container Complete specification for a process LocalResource is used to specify container binary and dependencies • NodeManager is responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext container commands environment localResources LocalResource uri type
  • 22. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The ApplicationMaster The per-application controller aka container_0 The parent for all containers of the application ApplicationMaster negotiates its containers from ResourceManager ApplicationMaster container is child of ResourceManager Think init process in Unix RM restarts the ApplicationMaster attempt if required (unique ApplicationAttemptId) Code for application is submitted along with Application itself
  • 23. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ApplicationSubmissionContext ApplicationSubmissionContext is the complete specification of the ApplicationMaster Provided by the Client ResourceManager responsible for allocating and launching the ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue
  • 24. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Application API - Overview hadoop-yarn-client module YarnClient is submission client API Both synchronous & asynchronous APIs for resource allocation and container start/stop Synchronous: AMRMClient & AMNMClient Asynchronous: AMRMClientAsync & AMNMClientAsync
  • 25. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Application API – YarnClient createApplication to create application submitApplication to start application Application developer provides ApplicationSubmissionContext APIs to get other information from ResourceManager getAllQueues getApplications getNodeReports APIs to manipulate submitted application e.g. killApplication
  • 26. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Application API – The Client NodeManager NodeManager NodeManager NodeManager Container 1.1 Container 2.4 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.2 Container 1.3 AM 1 Container 2.2 Container 2.1 Container 2.3 AM2 Client2 New Application Request: YarnClient.createApplication Submit Application: YarnClient.submitApplication 1 2 ResourceManager Scheduler
  • 27. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved AppMaster-ResourceManager API AMRMClient - Synchronous API registerApplicationMaster unregisterApplicationMaster Resource negotiation addContainerRequest removeContainerRequest releaseAssignedContainer Main API – allocate Helper APIs for cluster information getAvailableResources getClusterNodeCount AMRMClientAsync – Asynchronous Extension of AMRMClient to provide asynchronous CallbackHandler Callback interaction model with ResourceManager onContainersAllocated onContainersCompleted onNodesUpdated onError onShutdownRequest
  • 28. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved AppMaster-ResourceManager flow NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager AM registerApplicationMaster 1 4 AMRMClient.allocate Container 2 3 unregisterApplicationMaster ResourceManager Scheduler NodeManager NodeManager NodeManager NodeManager
  • 29. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved AppMaster-NodeManager API For AM to launch/stop containers at NodeManager AMNMClient - Synchronous API Simple (trivial) APIs • startContainer • stopContainer • getContainerStatus AMNMClientAsync – Asynchronous Simple (trivial) APIs startContainerAsync stopContainerAsync getContainerStatusAsync Callback interaction model with NodeManager onContainerStarted onContainerStopped onStartContainerError onContainerStatusReceived
  • 30. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN Application API - Development Un-Managed Mode for ApplicationMaster Run the ApplicationMaster on your development machine rather than in-cluster • No submission client needed Use hadoop-yarn-applications-unmanaged-am-launcher Easier to step through debugger, browse logs etc. $ bin/hadoop jar hadoop-yarn-applications-unmanaged-am-launcher.jar Client –jar my-application-master.jar –cmd ‘java MyApplicationMaster <args>’
  • 31. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A Simple YARN Application
  • 32. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A Simple YARN Application Simplest example of a YARN application – get n containers, and run a specific Unix command on each. Minimal error handling, etc. Control Flow 1. User submits application to the Resource Manager • Client provides ApplicationSubmissionContext to the Resource Manager 2. App Master negotiates with Resource Manager for n containers 3. App Master launches containers with the user-specified command as ContainerLaunchContext.commands Code: https://github.com/hortonworks/simple-yarn-app
  • 33. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – Client Command to launch ApplicationMaster process
  • 34. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – Client Resources required for ApplicationMaster container ApplicationSubmissionContext for ApplicationMaster Submit application to ResourceManager
  • 35. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – AppMaster Steps: 1. AMRMClient.registerApplication 2. Negotiate containers from ResourceManager by providing ContainerRequest to AMRMClient.addContainerRequest 3. Take the resultant Container returned via subsequent call to AMRMClient.allocate, build ContainerLaunchContext with Container and commands, then launch them using AMNMClient.launchContainer – Use LocalResources to specify software/configuration dependencies for each worker container 4. Wait till done… AllocateResponse.getCompletedContainersStatuses from subsequent calls to AMRMClient.allocate 5. AMRMClient.unregisterApplication
  • 36. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – AppMaster Initialize clients to ResourceManager and NodeManagers Register with ResourceManager Initialize clients to ResourceManager and NodeManagers
  • 37. Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – AppMaster Setup requirements for worker containers Make resource requests to ResourceManager
  • 38. Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – AppMaster Get containers from ResourceManager Launch containers on NodeManagers
  • 39. Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Simple YARN Application – AppMaster Wait for containers to complete successfully Un-register with ResourceManager
  • 40. Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Graduating from simple-yarn-app DistributedShell. Same functionality but less simple e.g. error checking, use of timeline server For a complex YARN app, see Tez Pre-warmed containers, sessions, etc. Look at MapReduce for even more excitement Data locality, fault tolerance, checkpoint to HDFS, security, isolation, etc Intra-application priorities (maps vs reduces) need complex feedback from ResourceManager (all at apache.org)
  • 41. Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Application Timeline Server
  • 42. Page42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Application Timeline Server Maintains historical state & provides metrics visibility for YARN apps Similar to MapReduce Job History Server Information can be queried via REST APIs ATS in HDP 2.1 is considered a Tech Preview Generic information • queue name • user information • information about application attempts • a list of Containers that were run under each application attempt • information about each Container Per-framework/application info Developers can publish information to the Timeline Server via the TimelineClient (from within a client), the ApplicationMaster, or the application's Containers.
  • 43. Page43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Application Timeline Server App Timeline Server AMBARI Custom App Monitoring Client
  • 44. Page44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Next Steps
  • 45. Page45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved hortonworks.com/get-started/YARN Setup HDP 2.1 environment Leverage Sandbox Review Sample Code & Execute Simple YARN Application https://github.com/hortonworks/simple-yarn-app Graduate to more complex code examples BUILD FLEXIBLE, SCALABLE, RESILIENT & POWERFUL APPLICATIONS TO RUN IN HADOOP
  • 46. Page46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks YARN Resources Hortonworks Web Site hortonworks.com/hadoop/yarn Includes links to blog posts YARN Forum Community of Hadoop YARN developers – collaboration and Q&A hortonworks.com/community/forums/forum/yarn YARN Office Hours Dial in and chat with YARN experts Next Office Hour: Thursday August 14 @ 10-11am PDT. Register: https://hortonworks.webex.com/hortonworks/onstage/g.php?t=a&d=628190636
  • 47. Page47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved And from Hortonworks University Hortonworks Course: Developing Custom YARN Applications Format: Online Duration: 2 Days When: Aug 18th & 19th (Mon & Tues) Cost: No Charge to Hortonworks Technical Partners Space: Very Limited Interested? Please contact lsensmeier@hortonworks.com
  • 48. Page48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Stay in Touch! Join us for the full series of YARN development webinars: YARN Native July 24 @ 9am PT (recording link) Slider August 7 @ 9am PT (registration link) Tez August 21 @ 9am PT (registration link) Additional webinar topics are being added – watch the blog or visit Hortonworks.com/webinars http://hortonworks.com/hadoop/yarn