SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Rev Up Your HPC Engine
Fritz Ferstl, CTO Univa Corp, fferstl@univa.com
Who is Univa?
Copyright © 2014 Univa Corporation. All Rights Reserved.
2
• Profile
• Based in Chicago,
global reach
• >500 customers in 3
yrs (mostly Fortune
500)
• Products
/Technologies:
• Univa Grid Engine
• UniSight
• Univa License
Orchestrator
• UniCloud
Data Center Automation Experts
Do more with less in Big Compute and Big Data
Help organizations
play a better game
of Tetris
Challenges for Workload and Resource
Management Systems
Copyright © 2014 Univa Corporation. All Rights Reserved. 3
Scalability
• Node counts stay flat or go down, sockets stay
flat, cores explode
• With the core explosion, the number of jobs also explodes
• Ever shorter run-times, more applications, more use
cases
• Large commercial sites approach or go beyond
100K
• Throughput clusters
process >150 million
jobs / month
4Copyright © 2014 Univa Corporation. All Rights Reserved.
Heterogeneity
5Copyright © 2014 Univa Corporation. All Rights Reserved.
• Hardware
• Multi-sockets, multi-cores
• Partial cluster upgrades
• Evolving memory, network and storage architectures
• Accelerators: GPUs, Phi
• Job Profiles
• Throughput
• Array Jobs
• Large Parallel
• Interactive
• Sessions
• Reservations
• Transactional
• Hybrid
• Dependencies, Workflows
Policy Variety
6Copyright © 2014 Univa Corporation. All Rights Reserved.
• Automated  Transparency?
• Manual overrides
• Preferential access
• Priorities
• Reservations
• Resource Urgencies
• Quotas
• Deadlines
• Conflict Resolution
• E.g. don‘t starve large
parallel plus maintain
high utilization
Use Case Variety
7Copyright © 2014 Univa Corporation. All Rights Reserved.
• Classical HPC (simulation)  Large parallel / many
mid-size parallel
• Verification / Test  Throughput
• From single simulation to parameter study  array
jobs
• Ultra-short jobs
• Big Data / Data Mining
• Exclusive usage of nodes
vs shared usage
Geographical Distribution / Clouds
8Copyright © 2014 Univa Corporation. All Rights Reserved.
• Resource sharing: servers, licenses, data, other
• Data access latencies
• Security
• File system dependencies
• Pre-/Post-Staging
• Data locality:
• Bring the job to the data
• Or bring the data to the job
Solutions
Approaches
Best Practices
Copyright © 2014 Univa Corporation. All Rights Reserved. 9
Evolve
• Architecture Evolution
• more cores / nodes / jobs 
make it faster
• Integration with GPUs, Phi, etc
• New Scheduling Algorithms
• Efficient handling of job mixes:
parallel / array / sequential jobs
• Scheduling of ultra-short jobs
• More Monitoring, Better Error Tracking
• Reporting, Accounting & Analytics
10Copyright © 2014 Univa Corporation. All Rights Reserved.
Be Street-Smart
• Simplify where possible!
• Be-all solution can be the
most expensive
• Effort
• Poor utilization  slow ROI
• Focus on most important goals
11Copyright © 2014 Univa Corporation. All Rights Reserved.
Think Different
• Examples:
• Less HA @ more throughput via fast SSD-Raid with
regular back-up
• Use array jobs whereever possible
• More smaller jobs vs fewer bigger
jobs
• All considered, preemption may
be a good option
12Copyright © 2014 Univa Corporation. All Rights Reserved.
Accept Difference
• Simple: temporarily designate parts of cluster
• Advanced: Cloud-share
• Share resources across separate workload
management system instances
• Dynamically re-assign resources
(servers) based on demand
• Provides autonomy while
maintaining high utilization
• But avoid meta-scheduling
where you can!
13Copyright © 2014 Univa Corporation. All Rights Reserved.
Tailored Solutions
• Tailoring & add-ons can make all the
difference
• Tailoring such as
• Job Classes
• Customized reports
• Add-ons such as
• Submission portals
and wrappers
14Copyright © 2014 Univa Corporation. All Rights Reserved.
Conclusions
• Workload & Resource Management Systems more
required than ever
• Specifically in the “new” era of Cloud and Big Data
• Allows you to benefit from 20+ years of experience in
HPC workload orchestration and to move beyond
• Clear-cut set of challenges  non-trivial solutions
• Build on best-in-class products, architectures and
development teams
• Being “street-smart” about architecting and configuration
of a cluster has big impact
15Copyright © 2014 Univa Corporation. All Rights Reserved.
Thank You
http://www.univa.com
fferstl@univa.com
Copyright © 2014 Univa Corporation. All Rights Reserved. 16

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop as a Data Hub
Hadoop as a Data HubHadoop as a Data Hub
Hadoop as a Data Hub
Dianna Doan
 

Was ist angesagt? (20)

Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Solutions for Healthcare IT
Solutions for Healthcare ITSolutions for Healthcare IT
Solutions for Healthcare IT
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
 
Hadoop as a Data Hub
Hadoop as a Data HubHadoop as a Data Hub
Hadoop as a Data Hub
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
 
Fighting the Hidden Costs of Data Storage
Fighting the Hidden Costs of Data StorageFighting the Hidden Costs of Data Storage
Fighting the Hidden Costs of Data Storage
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloads
 
Sgi hadoop
Sgi hadoopSgi hadoop
Sgi hadoop
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 
Introducing Data Lakes
Introducing Data LakesIntroducing Data Lakes
Introducing Data Lakes
 

Ähnlich wie Rev Up Your HPC Engine

Ähnlich wie Rev Up Your HPC Engine (20)

Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Altair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud TechnologiesAltair Leveraging Disruptive Cloud Technologies
Altair Leveraging Disruptive Cloud Technologies
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Sesion covergentes 2016
Sesion covergentes 2016Sesion covergentes 2016
Sesion covergentes 2016
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-WarWebinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Rev Up Your HPC Engine

  • 1. Rev Up Your HPC Engine Fritz Ferstl, CTO Univa Corp, fferstl@univa.com
  • 2. Who is Univa? Copyright © 2014 Univa Corporation. All Rights Reserved. 2 • Profile • Based in Chicago, global reach • >500 customers in 3 yrs (mostly Fortune 500) • Products /Technologies: • Univa Grid Engine • UniSight • Univa License Orchestrator • UniCloud Data Center Automation Experts Do more with less in Big Compute and Big Data Help organizations play a better game of Tetris
  • 3. Challenges for Workload and Resource Management Systems Copyright © 2014 Univa Corporation. All Rights Reserved. 3
  • 4. Scalability • Node counts stay flat or go down, sockets stay flat, cores explode • With the core explosion, the number of jobs also explodes • Ever shorter run-times, more applications, more use cases • Large commercial sites approach or go beyond 100K • Throughput clusters process >150 million jobs / month 4Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 5. Heterogeneity 5Copyright © 2014 Univa Corporation. All Rights Reserved. • Hardware • Multi-sockets, multi-cores • Partial cluster upgrades • Evolving memory, network and storage architectures • Accelerators: GPUs, Phi • Job Profiles • Throughput • Array Jobs • Large Parallel • Interactive • Sessions • Reservations • Transactional • Hybrid • Dependencies, Workflows
  • 6. Policy Variety 6Copyright © 2014 Univa Corporation. All Rights Reserved. • Automated  Transparency? • Manual overrides • Preferential access • Priorities • Reservations • Resource Urgencies • Quotas • Deadlines • Conflict Resolution • E.g. don‘t starve large parallel plus maintain high utilization
  • 7. Use Case Variety 7Copyright © 2014 Univa Corporation. All Rights Reserved. • Classical HPC (simulation)  Large parallel / many mid-size parallel • Verification / Test  Throughput • From single simulation to parameter study  array jobs • Ultra-short jobs • Big Data / Data Mining • Exclusive usage of nodes vs shared usage
  • 8. Geographical Distribution / Clouds 8Copyright © 2014 Univa Corporation. All Rights Reserved. • Resource sharing: servers, licenses, data, other • Data access latencies • Security • File system dependencies • Pre-/Post-Staging • Data locality: • Bring the job to the data • Or bring the data to the job
  • 9. Solutions Approaches Best Practices Copyright © 2014 Univa Corporation. All Rights Reserved. 9
  • 10. Evolve • Architecture Evolution • more cores / nodes / jobs  make it faster • Integration with GPUs, Phi, etc • New Scheduling Algorithms • Efficient handling of job mixes: parallel / array / sequential jobs • Scheduling of ultra-short jobs • More Monitoring, Better Error Tracking • Reporting, Accounting & Analytics 10Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 11. Be Street-Smart • Simplify where possible! • Be-all solution can be the most expensive • Effort • Poor utilization  slow ROI • Focus on most important goals 11Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 12. Think Different • Examples: • Less HA @ more throughput via fast SSD-Raid with regular back-up • Use array jobs whereever possible • More smaller jobs vs fewer bigger jobs • All considered, preemption may be a good option 12Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 13. Accept Difference • Simple: temporarily designate parts of cluster • Advanced: Cloud-share • Share resources across separate workload management system instances • Dynamically re-assign resources (servers) based on demand • Provides autonomy while maintaining high utilization • But avoid meta-scheduling where you can! 13Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 14. Tailored Solutions • Tailoring & add-ons can make all the difference • Tailoring such as • Job Classes • Customized reports • Add-ons such as • Submission portals and wrappers 14Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 15. Conclusions • Workload & Resource Management Systems more required than ever • Specifically in the “new” era of Cloud and Big Data • Allows you to benefit from 20+ years of experience in HPC workload orchestration and to move beyond • Clear-cut set of challenges  non-trivial solutions • Build on best-in-class products, architectures and development teams • Being “street-smart” about architecting and configuration of a cluster has big impact 15Copyright © 2014 Univa Corporation. All Rights Reserved.
  • 16. Thank You http://www.univa.com fferstl@univa.com Copyright © 2014 Univa Corporation. All Rights Reserved. 16