Inspired by the cloud native community and CNCF Research end-users such as CERN, University of Michigan and many others. With our small contribution, Nora Alwadah and I extended the bridge to the Saudi HPC community.
Key takeaway: Follow and join the new Kubernetes Batch Working Group. Help them nourish and evolve.
5. 5
Non-Business Use
Architectural design that breaks an application to
independent, loosely-coupled, individually deployable services.
• Portability was a challenge.
Orchestration
Containers
Microservices
6. 6
Non-Business Use
Bundling of an application and all its dependencies as a
package to be deployed regardless of environment.
Orchestration
Containers
Microservices
7. 7
Non-Business Use
Automation of the operational effort required to run the
lifecycle of a container; its workloads and services .
• provisioning, deployment, scaling (up and down), networking, load
balancing and more.
• Enabling DevOps and CI/CD
Orchestration
Containers
Microservices
10. 10
Non-Business Use
Google & Linux Foundation Project
Founded in 2015
Advance Container Technology
App Definition & Development
Database, Streaming & Messaging, App Def & Image building, CICD
Orchestration & Management
Scheduling & Orchestration, Coordination & Service Discovery, Remote Procedure Call,
Service Proxy, API Gateway, Service Mesh
Runtime
Cloud Native Storage, Container Runtime, Cloud Native Network
Provisioning
Automation & Configuration, Container Registry, Security & Compliance, Key
Management
Special
Kubernetes Certified Service Provider, Kubernetes Training Partner,
Platform
Certified Kubernetes
Distribution, Host, Installer
Observability &
Analysis
Monitoring, Logging, Tracing,
Chaos Engineering, Continuous
Optimization
Serverless
11. 11
Non-Business Use
Google & Linux Foundation Project
Founded in 2015
Advance Container Technology
App Definition & Development
Database, Streaming & Messaging, App Def & Image building, CICD
Orchestration & Management
Scheduling & Orchestration, Coordination & Service Discovery, Remote Procedure Call,
Service Proxy, API Gateway, Service Mesh
Runtime
Cloud Native Storage, Container Runtime, Cloud Native Network
Provisioning
Automation & Configuration, Container Registry, Security & Compliance, Key
Management
Special
Kubernetes Certified Service Provider, Kubernetes Training Partner,
Platform
Certified Kubernetes
Distribution, Host, Installer
Observability &
Analysis
Monitoring, Logging, Tracing,
Chaos Engineering, Continuous
Optimization
Serverless
Scheduling
Observability
Storage
Network
UX
High Performance Computing
12. Cloud Native
Distributed
Cloud
Kubernetes
CNCF launched v1.0 GA
Huawei Cloud Container Engine (CCE)
Google Kubernetes Engine (GKE)
KubeEdge
CNCF’s first intelligent
edge computing project
Volcano
CNCF’s first batch
scheduling project
Distributed
Cloud Native
Slurmnetes
Batch scheduling failed
attempts
KubeFlow
Machine learning framework for operations,
pipelines, training & deployment.
MindSpore
Deep Learning framework for
mobile, edge, cloud scenarios
Karmada
CNCF’s first multi-cloud
container orchestration project
Evolution Timeline
Kueue
Kubernetes-native job
queueing
Cern
1000 node POC
2015 2016 2019 2020 2021
2017 2018 2022
2011
Cycle Computing
Running cloud HPC around 8
regions
Expanded upon chart from https://bit.ly/FrontiersCloudNative
13. HPC Cloud Adoption Challenges
Special
Hardware
Data
Gravity
Paradigm
Shift
• Network latency, as in special IB
• GPUS, accelerators, Numa …etc
• CPU architecture and topology
TOP 500
14. HPC Cloud Adoption Challenges
Special
Hardware
Data
Gravity
Paradigm
Shift
• Data governance
• Data residency
• Egress cost
• Higher the availability, higher the cost
Services
Data
Apps
Throughput Latency
15. HPC Cloud Adoption Challenges
Special
Hardware
Persistent Storage
Kubernetes Control Plane
K8s Kubelet K8s Kubelet
K8s Kubelet
Image Registry
Data
Gravity
Paradigm
Shift
• Both, learning and adoption
• Distributing workload as images (registry)
16. Research End User: CERN
https://bit.ly/HPCSAUDI-cern-org
CERN is the European Organization for Nuclear
Research.
• Kubernetes use case: Particle Physics
• Experimented with virtualization early to
enable ease of management and
automation.
2017 first Kubernetes POC
1000 worker nodes
Data 330 PB
Hybrid on-demand infra 3hrs > 15 min
17. Public Cloud Use Cases
“Focus on your application and results”
• Dynamically provision resources
• Plans, schedules, and executes
• Fully managed “Serverless”
• Free
• Integration with AWS services
2020 Statistics
Largest Cluster 1,243,000 vCPUS
Largest Container Image 30 GB
No. simulatenous jobs 500,000
Customers Thousands 1000s
18. The CNCF Community
It's very hard right now to justify developing a new product in-house. There is
really no real reason to keep doing that. It's much easier for us to try it out,
and if we see it's a good solution, we try to reach out to the community and
start working with that community.”
19. Where to next?
• Kubernetes Batch HPC Day North America 2022
• SC22 Containers and New Orchestration Paradigms for Isolated Environments in HPC
• CNCF Research User Group
• CNCF Technical Advisory Group for Runtime
• Kubernetes Community: Batch WorkGroup
• CNCF Batch System Initiative Working Group