SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Coexistence and Migration of Vendor HPC-based
Infrastructure to Hadoop Ecosystem/YARN
Solution
S&P Captital IQ
Friday 22nd May, 2015
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Agenda
The HPC Inheritance
The Need
Integrating the Hadoop Ecosystem
Integration of HPC vendor based and the Hadoop ecosystem
via YARN AM
Advantages and Potential Drawbacks
Closing & Questions
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The HPC Inheritance
Preexisting
HPC distributed computing infrastructure established
2003-2007
Usually 500 - 5000 cores, but some instances 100K, not on a
single (HA) RM
Vendor products, (usually) closed source
No separate resource schedulers, a notable exception (EGO,
Platform Computing)
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The HPC Inheritance
Preexisting
The HPC applications
HPC systems built with: MPICH, OpenMPI, ACE-TAO or
Sockets
Few applications have 80% of the computational resources,
80/20 (Pareto) principle
Designed for computation heavy apps, with low I/Os, with
concentrated demand in range of hours
Low latency/high throughput, but some variances
Built with a particular (vendor) API implementation, callbacks
Continuous optimization cycles, on algorithmic and on
infrastructure levels
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The Need
Engineer a new system, distributed computing & data, at
reasonable cost.
Reuse the infrastructure
Reuse already built internal knowledge
Current HPC applications should not experience noticeable
slowness
Growing awareness that heavy compute & data oriented
application need to be built in distributed fashion sharing
resources
Efficient resource utilization
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integrating the Hadoop Ecosystem
Apache Hadoop on the existing HPC infrastructure: hardware
coexistence, resource mapping one-to-one
.bashrc user account profile to setup the environment for both
of the systems
Using YARN as a resource scheduler for the both systems
May need OS optimization due to I/Os
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
Building AM, using the vendor API to control the HPC
computational processes, allocation on demand considering
the HPC specifics
AMRMClientAsync handles AM communications with RM,
needs CallbackHandler implementation
Depending on the HPC API, queue pooling or events
notification
up/down process, memory-utilization efficient - process start,
slow
open/close fast - memory footprint
HPC YARN AM, uses YARN API calls and HPC management
API, variety of combinations of resource allocation/release
possible
fixed, fixed + incremental, incremental only
scheduling based on job patterns, prediction scheduling (art)
combinations of the above
Handling the YARN’s callbacks for resource management
Recoverable on AM crash: simple state based on config
parameters and HPC Scheduler queues state
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+1
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+2
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+n
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Advantages and Potential Drawbacks
Advantages
Sharing resources: Apache Hadoop coexisting with HPC
increasing resources utilization
the pattern changes are visible at the node compute resources
in contrast to the network that can have quite complex
topology and behavior
allowing new infrastructure to grow out of the existing one
Potential Drawbacks
Sharing resources: HPC AM logic adds additional complexity
and in some cases it may be considerable
The work is somehow slower, implementing gradual changes
and observing the system behavior based on the job patterns
May impose additional data block transfers on the network
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Closing & Questions
Integrating the HPC RM/Schedulers with the Hadoop
Ecosystem via a custom AM valve, an optimal way to make
the HPC aware of YARN
Slowing hardware expansion & efficient resource utilization
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
DataWorks Summit
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 

Was ist angesagt? (20)

What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 

Andere mochten auch

Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 

Andere mochten auch (20)

Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made Easy
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 
Big Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorBig Data Challenges in the Energy Sector
Big Data Challenges in the Energy Sector
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Online Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQLOnline Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQL
 

Ähnlich wie Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosystem/YARN solution

1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main
Hamza Zafar
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
inside-BigData.com
 

Ähnlich wie Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosystem/YARN solution (20)

1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Hadoop big data
Hadoop   big dataHadoop   big data
Hadoop big data
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
 
LEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HPLEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HP
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosystem/YARN solution

  • 1. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution S&P Captital IQ Friday 22nd May, 2015
  • 2. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Agenda The HPC Inheritance The Need Integrating the Hadoop Ecosystem Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Advantages and Potential Drawbacks Closing & Questions
  • 3. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The HPC Inheritance Preexisting HPC distributed computing infrastructure established 2003-2007 Usually 500 - 5000 cores, but some instances 100K, not on a single (HA) RM Vendor products, (usually) closed source No separate resource schedulers, a notable exception (EGO, Platform Computing)
  • 4. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The HPC Inheritance Preexisting The HPC applications HPC systems built with: MPICH, OpenMPI, ACE-TAO or Sockets Few applications have 80% of the computational resources, 80/20 (Pareto) principle Designed for computation heavy apps, with low I/Os, with concentrated demand in range of hours Low latency/high throughput, but some variances Built with a particular (vendor) API implementation, callbacks Continuous optimization cycles, on algorithmic and on infrastructure levels
  • 5. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The Need Engineer a new system, distributed computing & data, at reasonable cost. Reuse the infrastructure Reuse already built internal knowledge Current HPC applications should not experience noticeable slowness Growing awareness that heavy compute & data oriented application need to be built in distributed fashion sharing resources Efficient resource utilization
  • 6. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integrating the Hadoop Ecosystem Apache Hadoop on the existing HPC infrastructure: hardware coexistence, resource mapping one-to-one .bashrc user account profile to setup the environment for both of the systems Using YARN as a resource scheduler for the both systems May need OS optimization due to I/Os
  • 7. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC Building AM, using the vendor API to control the HPC computational processes, allocation on demand considering the HPC specifics AMRMClientAsync handles AM communications with RM, needs CallbackHandler implementation Depending on the HPC API, queue pooling or events notification up/down process, memory-utilization efficient - process start, slow open/close fast - memory footprint HPC YARN AM, uses YARN API calls and HPC management API, variety of combinations of resource allocation/release possible fixed, fixed + incremental, incremental only scheduling based on job patterns, prediction scheduling (art) combinations of the above Handling the YARN’s callbacks for resource management Recoverable on AM crash: simple state based on config parameters and HPC Scheduler queues state
  • 8. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 9. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+1 YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 10. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+2 YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 11. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+n YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 12. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Advantages and Potential Drawbacks Advantages Sharing resources: Apache Hadoop coexisting with HPC increasing resources utilization the pattern changes are visible at the node compute resources in contrast to the network that can have quite complex topology and behavior allowing new infrastructure to grow out of the existing one Potential Drawbacks Sharing resources: HPC AM logic adds additional complexity and in some cases it may be considerable The work is somehow slower, implementing gradual changes and observing the system behavior based on the job patterns May impose additional data block transfers on the network
  • 13. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Closing & Questions Integrating the HPC RM/Schedulers with the Hadoop Ecosystem via a custom AM valve, an optimal way to make the HPC aware of YARN Slowing hardware expansion & efficient resource utilization Q&A