Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and Bring-Your-Own-Resources
1. Open Infrastructure for an Open Society:
OSG, Commercial Clouds, and
Bring-Your-Own-Resources
4NRP
February 9th, 2023
2. • James Deaton
• Executive Director, Great Plains Network
• Derek Weitzel
• Research Assistant Professor, University of Nebraska-Lincoln,
OSG, PATh, PNRP
• Jeremy Evert
• Associate Professor, Computer Science, Southwestern
Oklahoma State University
• Igor Sfiligoi
• Lead Scientific Software Developer and Researcher, San Diego
Supercomputer Center
3. Open Infrastructure
Derek Weitzel – University of Nebraska-Lincoln
(Strictly Derek’s Opinions)
This project is supported by the National Science Foundation under Cooperative
Agreements OAC-2112167,. Any opinions, findings, conclusions or
recommendations expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science Foundation.
5. How is NRP “Open Infrastructure”
•All components are Open Source
• Kubernetes and containers
•Anyone can contribute resources
•Anyone can use the resources
•Documented Interfaces
•Resources were ”seeded” through various grants
• But grew with contributions from users
7. How is OSG “Open Infrastructure”
•All components are Open Source
• HTCondor and various tools
•Anyone can contribute resources
•Anyone can utilize the resources
•Interfaces are documented: osg-htc.org/docs
•Resources are ”seeded” by organizations such as LHC, and now CC*.
• But have grown through contributions of users
9. How is OSDF “Open Infrastructure”
• All components are Open Source
• Anyone can contribute resources
• Interfaces are documented
• Resources were “seeded” by various grants and Internet2
• But have grown by contributions from users, and soon CC*
10. Leveraging NRP on a smaller
campus
Jeremy Evert
Associate Professor, Southwestern Oklahoma State University
February 9th, 2023
11. About Southwestern Oklahoma State University
● 10th in the state in enrollment behind 2 community colleges
● 5,000 students across two campuses
○ Formerly a teaching college
○ Formerly a tribal serving institution
● Non-PhD Granting
● Serves a portion of the minorities in the area
● Around 200 full time faculty and about 60% hold a terminal degree
12. Bringing Our Own Resource
● SWOSU had: 200 Sq. Ft. Server closet, 5 ton A/C, 42U rack
○ NSF CC* switch
● Dell Server, 96 AMD cores, some memory, spinning disk, small gpu
● San Diego team guided SWOSU through NVMe storage upgrade
● Faculty installed Ubuntu for a base OS
● OneNet (State ISP) helped troubleshoot network
● San Diego deployed Nautilus node
● James Deaton enabled user authentication through
github.com/SWOSU
● OneNet (state ISP) and SWOSU central IT provided an alias for
jupyter.swosu.edu
13. Engage and empower every SWOSU student
● SWOSU Computer Science Discrete Structures assignment: join
GitHub.com/swosu
● Students are pointed to our server as soon as they start running codes
that heat up their laptop
● Promoted on every syllabus I have
14. Engage and empower every elementary and high
school student and researcher
● SWOSU invites area technology teachers for a weeklong camp
○ Esports, graphic design, Microsoft, and programming
● Full day on teaching programming
● Teachers run jobs on jupyter.swosu.edu
15. Supporting SWOSU for the next 10 years
● Enable more science drivers
○ Physics, Math, Biology, and other Compute Science faculty
● Partner with SWOSU Education Department to integrate more of the
Campus Champions / Carpentries type trainings into new primary
education curriculum
● Leverage mentors from NRP / Great Plains Network / OneNet /
OneOklahoma Cyber Infrastructure Initiative to keep growing
○ Look to NSF CC* or small school MRI to expand current platform
16. Please consider a weekly statewide call
● Set up a email list
● Encourage key players to join
● Allowing staff to show up and make connections
● Look for ways to add value to the individuals and larger community
● Connection to a larger community enables faculty at smaller schools
17. Open Infrastructure for an Open Society:
Commercial Clouds
Igor Sfiligoi
University of California San Diego
San Diego Supercomputer Center
Fourth National Research Platform (4NRP) – Feb 9th, 2023 1
18. Who cares about Commercial Clouds?
• Seems like everyone in industry is moving there!
• Not really, but it does look like it
• The big players have huge compute capacity
• Personally verified I can access 50k GPUs
• Others demonstrated access to several million CPU cores
• They have a large variety of compute resources
• Many x86 variants and several ARM CPUs
• Many GPU variants
• AI accelerators and FPGAs
• Great networking setups (both WAN and HPC-class LAN/Infiniband)
2
20. Who cares about Commercial Clouds?
• Seems like everyone in industry is moving there!
• Not really, but it does look like it
• The big players have huge compute capacity
• Personally verified I can access 50k GPUs
• Others demonstrated access to several million CPU cores
• They have a large variety of compute resources
• Many x86 variants and several ARM CPUs
• Many GPU variants
• AI accelerators and FPGAs
• Great networking setups (both WAN and HPC-class LAN/Infiniband)
4
21. Often have new HW
available before
you can buy it
5
Also,
Cloud-exclusive
HW variants
• CPUs
• INTEL Saphire Rapids available
on Google Cloud now
• AMD EPYC Milan-X available on
Azure now
• AMD EPYC Genoa in preview
• NVIDIA GPUs
• A10s were available in AWS
in 2021
• ARM CPUs
• AWS has its own ARM CPU
• Azure and Google regular one
• AI Accelerators
• AWS has Inferentia
• Google has TPUs
• AWS also offers Habana Gaudi
• FPGAs
• AWS had FPGAs since forever
22. Who cares about Commercial Clouds?
• Seems like everyone in industry is moving there!
• Not really, but it does look like it
• The big players have huge compute capacity
• Personally verified I can access 50k GPUs
• Others demonstrated access to several million CPU cores
• They have a large variety of compute resources
• Many x86 variants and several ARM CPUs
• Many GPU variants
• AI accelerators and FPGAs
• Great networking setups (both WAN and HPC-class LAN/Infiniband)
6
24. Pros and cons of Commercial Clouds
• Pros:
• See previous slide
• No need to go through allocation processes… all you need is money
• Cons:
• You need money
• And lots of it
• ”Regular”, on-demand Cloud computing is expensive
• Anywhere between 3x and 10x what you would pay on-prem on 24/7 basis
• Spot pricing is almost comparable to on-prem, but only useful for preemptible work
• Easy to get in, hard to get out
• Pricing optimized to let data get in cheaply, but expensive to move out
• No automatic price caps, easy to overspend
8
26. 10
vs
Private jet
Commercial airline
Ticket bought 2 months in advance
in economy class
through your travel department
Both will get you from A to B
Which one would you pick?
27. Who should consider Commercial Cloud?
• Flexible/urgent computing
• Hard to beat the scalability of the clouds
• Costs acceptable for short spikes
• Prototyping, R&D
• The variety of HW available in the clouds is hard to match
• Instant access, no-contention drastically raises productivity
• Ultra-High-Availability services
• Hard to beat the breath of Cloud deployments
• Many large datacenters, proven track record
11
28. Is Commercial Cloud easy to use?
• Yes and no
• Provide enormous flexibility
• You can do virtually everything you could do with your personal server
• But that can be daunting for non-IT users
• Lots of support services
• No need to reinvent the wheel, just pick one
• Finding what you need can be a challenge, lots of competing options
• Cloud providers invest a lot in the user interfaces
• More intuitive than anything you will find on-prem
• But each provider has its own flavor
• How do you mix on-prem and Cloud resources?
12
29. Facilitating Cloud access for science users
• CloudBank
• Account management and monitoring (I love their spend/budget tracking!)
• Extensive documentation/training
• Integrate with OSG/PATh/HTCondor ecosystem
• IT-savvy support staff can easily add cloud resources to a HTCondor pool
• Users see only HTCondor, cloud HW no different that on-prem HW
• Kubernetes (k8s) to the rescue
• All Cloud Providers expose a Kubernetes interface, too
• Cloud k8s feels like on-prem k8s (at least for compute)
• Kubernetes federation can make it completely transparent, e.g. from Nautilus
13