Suche senden
Hochladen
June 10 145pm hortonworks_tan & welch_v2
•
Als PPTX, PDF herunterladen
•
2 gefällt mir
•
1,759 views
DataWorks Summit
Folgen
Melden
Teilen
Melden
Teilen
1 von 36
Jetzt herunterladen
Empfohlen
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
DataWorks Summit
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
Hortonworks
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Hortonworks
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Empfohlen
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
DataWorks Summit
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
Hortonworks
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Hortonworks
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
Hortonworks
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
Protecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
DataWorks Summit
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Hortonworks
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Hortonworks
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
Weitere ähnliche Inhalte
Was ist angesagt?
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
Hortonworks
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
Protecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
DataWorks Summit
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Hortonworks
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Hortonworks
Was ist angesagt?
(20)
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
Hadoop crashcourse v3
Hadoop crashcourse v3
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Protecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Andere mochten auch
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
DataWorks Summit
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
DataWorks Summit
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
DataWorks Summit
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
DataWorks Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
DataWorks Summit
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic Data
DataWorks Summit
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
DataWorks Summit
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
Complex Analytics using Open Source Technologies
Complex Analytics using Open Source Technologies
DataWorks Summit
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
Andere mochten auch
(20)
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic Data
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
Complex Analytics using Open Source Technologies
Complex Analytics using Open Source Technologies
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Ähnlich wie June 10 145pm hortonworks_tan & welch_v2
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Wangda Tan
Scheduling Policies in YARN
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
Xuan Gong
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
YARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
A Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Running Services on YARN
Running Services on YARN
DataWorks Summit/Hadoop Summit
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
Ähnlich wie June 10 145pm hortonworks_tan & welch_v2
(20)
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Scheduling Policies in YARN
Scheduling Policies in YARN
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
YARN - Past, Present, & Future
YARN - Past, Present, & Future
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
A Multi Colored YARN
A Multi Colored YARN
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Running Services on YARN
Running Services on YARN
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Mehr von DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mehr von DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
June 10 145pm hortonworks_tan & welch_v2
1.
Page1 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Enabling diverse workload scheduling in YARN June, 2015 Wangda Tan, Hortonworks, (wangda@apache.com) Craig Welch, Hortonworks, (cwelch@hortonworks.com)
2.
Page2 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved About us Wangda Tan • Last 5+ years in big data field, Hadoop, Open-MPI, etc. • Past – Pivotal (PHD team, brings OpenMPI/GraphLab to YARN) – Alibaba (ODPS team, platform for distributed data-mining) • Now – Apache Hadoop Committer @Hortonworks, all in YARN. – Now spending most of time on resource scheduling enhancements. Craig Welch • Yarn Contributor
3.
Page3 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Hadoop+YARN is the home of big data processing.
4.
Page4 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Our workloads vary, Service | Batch | interactive/ real-time
5.
Page5 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved They have different CRAZY requirements I wanna be fast! When cluster is busy Don’t take away MY RESOURCES A huge job needs be scheduled at a special time
6.
Page6 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved We want to make them AS HAPPY AS POSSIBLE to run together in YARN.
7.
Page7 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Let’s start…
8.
Page8 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Agenda today • Overview • Node Label • Resource Preemption • Reservation system • Pluggable behavior for Scheduler • Docker support • Resource scheduling beyond memory
9.
Page9 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Overview
10.
Page10 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Background • Resources are managed by a hierarchy of queues. • One queue can have multiple applications • Container is the result resource scheduling, Which is a bundle of resources and can run process(es)
11.
Page11 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved How to manage your workload by queues • By organization: –Marketing/Finance queue • By workload –Interactive/Batch queue • Hybrid –Finance- batch/Marketing- realtime queue
12.
Page12 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Node Label
13.
Page13 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Node Label – Overview • Types of node labels – Node partition (Since 2.6) – Node constraints (WIP) • Node partition (Today’s focus) – One node belongs to only one partition – Related to resource planning • Node constraints – One node can assign multiple constraints – Not related to resource planning
14.
Page14 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Node partition – Resource planning • Nodes belong to “default partition” if not specified • It’s possible to specify different capacities of queues on different partitions –For example, sales queue can use different resource on GPU and default partition. • It’s possible to specify some partition will be only used by some queues (ACL for partition) –For example, only sales queue can access “Large memory partition”
15.
Page15 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Node partition – Exclusive vs. Non-exclusive Snake Partition Bear partition Default partition Exclusive partition Non-exclusive partition Use it when they're not at home Resource Request
16.
Page16 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Node Partition – Use cases & best practice • Dedicate nodes to run important services: –E.g. Running HBase region server using Apache Slider • Nodes with special hardware in the cluster are used by organizations. –E.g. You may want a queue dedicated to the marketing department to use 80% of these memory-heavy nodes. • Use non-exclusive node partition to make better resource utilization. • Be careful about user-limits, capacity, etc. to make sure jobs can be launched I will cover more details about implementation & usage in Thursday morning’s session “YARN Node Labels” with Mayank Bansal from Ebay.
17.
Page17 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Resource Preemption
18.
Page18 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Resource Preemption – Overview • Queue has configured minimum resource. • Since it has a minimum resource value, the preemption policy (which performs preempting resources) is used to insure that: –When a queue is under its “minimum resource”, and the cluster doesn’t have available resources, preemption policy can get resource from other queues use more than their minimum resource. A B C 20% 30% 50%
19.
Page19 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Resource Preemption – Example • When preemption is not enabled • When preemption is enabled
20.
Page20 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Resource Preemption – best practice •Configurations to control the pace of preemption: –yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill –yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round –yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor •Configurations to control when or if preemption happens –yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity (deadzone) –yarn.scheduler.capacity.<queue-path>.disable_preemption
21.
Page21 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Reservation System
22.
Page22 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Reservation System – Overview • Reserving resource ahead of time – Just like ordering table in a restaurant – “I need a table for X people at Y time” – “Wait for moment … Reservation confirmed sir“ – (After some time), “Your table is ready” –What Reservation System does is: –Send a reservation request –RM checks time table –Send back reservation confirmation ID –Notify when ready •Enables more predictable start and run time for time-critical / resource intensive applications
23.
Page23 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Reservation System – Use cases •Gang scheduling – Currently, YARN can do gang scheduling from application side (holding resources until it meets requirements) – Resources could be wasted and there’s risk of deadlocks. –RS lays the foundation for gang scheduling •Workflow support – I want to run jobs in stages – Stage-1 at 1 AM tomorrow, needs 10k containers – Stage-2 after stage-1, needs 5k containers – Stage-3 after stage-2, needs 2k containers – You can submit such requests to RS!
24.
Page24 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Reservation System – Result & References •Before & After Reservation System (reports from MSR) – It increased cluster utilization a lot! •References – Design / Discussion / Report : YARN-1051 – More detail about example : YARN-2609
25.
Page25 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Pluggable scheduler behavior
26.
Page26 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Why • Problem • It’s difficult to share functionality between schedulers • Users cannot achieve the same behavior with all schedulers • Fixes and enhancements tend to end up in one scheduler, not all, leading to fragmentation • No simple mechanism exists to mix behaviors for a given feature in a single cluster • Solution • Move to sharable, pluggable scheduler behavior
27.
Page27 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved How • The Goal –Recast scheduler behavior as policies – candidates include –Resource limits for apps, users... –Ordering for allocation and preemption • With this, we can: –Maximize feature availability and reduce fragmentation –Configure different queues for different workloads in a single cluster Flexible Scheduler configuration, as simple as building with Legos!
28.
Page28 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Ordering Policy of Capacity Scheduler • Pluggable ordering policies for LeafQueues in Capacity Scheduler –Enables the implementation of different policies for ordering assignment and preemption of containers for applications –Initial implementations include FIFO (Capacity Scheduler original behavior) and Fair –User Limits and Queue Capacity limits are still respected • Fair scheduling inside Capacity Scheduler –Based on the Fair Sharing logic in FairScheduler –Assigns containers to applications in order of least to greatest resource usage –Allows many applications to make progress concurrently –Lets short jobs finish in reasonable time while not starving long running jobs
29.
Page29 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Configuration and tuning • Rough guidelines for when to use Fair and FIFO ordering policies • Configuration –yarn.scheduler.capacity.<queue>.ordering- policy (“fifo” or “fair”, default “fifo”) –yarn.scheduler.capacity.<queue>.ordering- policy.fair.enable-size-based-weight (true or false) • Tuning –Use max-am-resource-percent to avoid “peanut buttering” from having too many apps running at once –Sometimes it’s necessary to separate large and small apps in different queues, or use size-based-weight, to avoid large app starvation Workloads Policy On- demand/interactive/ exploratory Fair Predictable/Recu- rring batch FIFO Mix of above two Fair
30.
Page30 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Docker container support
31.
Page31 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Docker container support – Overview • Containers for the Cluster –Brings the sandboxing and dependency isolation of container technology to Hadoop –Containers make it simple to use Hadoop resources for a wider range of applications
32.
Page32 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Docker container support – Status • Done –(V1) Initial implementation translating Kubernetes to an Application Master launching Docker containers from the Cluster met with success. –(V2) A custom container launcher for Docker containers. This brought the capability more fully under the management of YARN, –but a single cluster could not support both traditional YARN applications (MapReduce, etc) and Docker concurrently • Next phase –(V3) WIP, is adding support for running Docker and traditional YARN applications side-by-side in a single cluster
33.
Page33 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved It’s not all about memory
34.
Page34 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved It’s not all about Memory - CPU • What’s in a CPU –Some workloads are CPU intensive, without accounting for this nodes may end up CPU bound or CPU may be under utilized cluster-wide –CPU awareness at the scheduer level is enabled by selecting the DominantResourceCalculator. –Dominant? “Dominant” stands for the “dominant factor”, or the “bottleneck”. In simplified terms, for the resource type which is the most constrained becomes the dominant factor for any given comparison or calculation –For example, If there is enough memory but not enough cpu for a resource request, the cpu component is dominant ( and the answer is “No” ) –See https://www.cs.berkeley.edu/~alig/pap ers/drf.pdf for more detail
35.
Page35 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved It’s not all about Memory – CPU - Vcores • What’s in a CPU –The unit used to abstract CPU capability in YARN is the vcore –Vcore counts are configured per- node in the yarn-site.xml, typically 1-1 vcore to physical CPU –If some Nodes’ CPUs outclass other nodes’, the number of vcores per physical CPU can be adjusted upward to compensate
36.
Page36 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Q & A ?
Jetzt herunterladen