Powerful Google developer tools for immediate impact! (2023-24 C)
Â
Accelerating Time to Science:Transforming Research in the Cloud
1. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Accelerating Time to Science:
Transforming Research in the Cloud
Jamie Kinney - @jamiekinney
Director of Scientific Computing, a.k.a. âSciCoâ â Amazon Web Services
Dr. Michael Ernst - @brookhavenlab
Director, RHIC and ATLAS Computing Facility - Brookhaven National
Laboratory
Š2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Agenda
⢠An introduction to scientific computing on AWS
⢠How are researchers using AWS today?
⢠Case study: How the ATLAS experiment is using AWS
⢠Q & A
3. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What do we mean by scientific computing?
Scientific computing refers to the application of simulation, mathematical
modeling, and quantitative analysis to analyze and solve scientific problems.
4. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
How is AWS used for scientific computing?
⢠High Performance Computing (HPC) for engineering and
simulation
⢠High-throughput computing (HTC) for data-intensive
analytics
⢠Hybrid supercomputing centers
⢠Collaborative research environments
⢠Citizen science
⢠Science-as-a-Service
5. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Why do researchers love using AWS?
Time to science
access research
infrastructure in minutes
Low cost
pay-as-you-go pricing
Globally accessible
easily collaborate with
researchers around the world
Secure
A collection of tools to
protect data and privacy
Scalable
access to effectively
limitless capacity
Elastic
easily add or remove capacity
6. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Why does AWS care about scientific computing?
⢠We want to improve our world by accelerating the pace of scientific discovery
⢠It is a great application of AWS with a broad customer base
⢠The scientific community helps us innovate on behalf of all customers
â Streaming data processing and analytics
â Exabyte scale data management solutions and exaflop scale compute
â Collaborative research tools and techniques
â New AWS regions
â Significant advances in low-power compute, storage, and data centers
â Efficiencies that will lower our costs and therefore pricing for all customers
7. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Research grants
AWS provides free usage credits to help
researchers:
⢠Teach advanced courses
⢠Explore new projects
⢠Create resources for the scientific
community
aws.amazon.com/grants
8. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Peering with all global research networks
Image courtesy John Hover - Brookhaven National Lab
9. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Restricted-access genomics on AWS
aws.amazon.com/genomics
11. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
High-throughput computing at scale
The Large Hadron Collider
experiments @ CERN involve
thousands of researchers from over
40 countries and produces tens of PB
of data each year.
The ATLAS and CMS experiments are
using AWS for Monte Carlo
simulations and analysis of LHC data.
12. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Data-intensive computing
The Square Kilometer Array (SKA) will link 250,000 radio
telescopes together, creating the worldâs most sensitive
telescope. The SKA will generate zettabytes of raw data,
publishing exabytes annually over 30-40 years.
Researchers are using AWS to develop and test:
⢠Data processing pipelines
⢠Image visualization tools
⢠Exabyte-scale research data management
⢠Collaborative research environments
aws.amazon.com/solutions/case-studies/icrar/
13. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
High Performance Computing
Simulations in the automotive sector
⢠Crash and materials simulations
⢠Fluid and thermal dynamics simulations
⢠Car body aerodynamics
⢠Electronics and electromagnetic simulations
Honda materials science simulations on AWS:
⢠Deploying scalable HPC clusters on AWS Spot Instances â up to 1,000 C3
instances
⢠Running more simulations than before, for more accurate results
âCloud offers us an opportunity, as we can innovate faster than before.â
- Ayumi Tada, IT System Administrator, Honda R&D
14. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Schrodinger and Cycle Computing:
Computational chemistry for better solar power
Simulation by Mark Thompson of the
University of Southern California to see
which of 205,000 organic compounds
could be used for photovoltaic cells for
solar panel material.
Estimated computation time 264 years
completed in 18 hours.
⢠156,314 core cluster, 8 regions
⢠1.21 petaFLOPS (Rpeak)
⢠$33,000 or 16¢ per molecule
loosely
coupled
15. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Science-as-a-Service
Globus Genomics, DNAnexus, and SevenBridges Genomics offer inexpensive, easy-to-use, and
secure platforms for processing and analyzing genomic data.
The Weather Company pushes four gigabytes of data to AWS each
second in order to deliver 15 billion forecasts each day to their
customers around the world.
aws.amazon.com/solutions/case-studies/the-weather-company/
16. Case Study: Brookhaven National Laboratory
ATLAS: Accelerating Scientific Discovery with AWS
17. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Accelerating Scientific
Discovery in the Cloud
Michael Ernst
Brookhaven National Laboratory
June 25, 2015
AWS Government, Education, and Nonprofits Symposium
Washington, D.C.
36. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Leveraging the AWS Spot market for compute-hungry HEP
⢠Cloud resources are very valuable to HEP experimental computing,
and HEP generally is a big user
⢠In the past, experimental HEP has used commercial cloud resources little â
we want to change that
⢠We are compute-limited in our science â cloud resources can enrich the
science
⢠Clouds have (cost-efficient) room for us if our workload is fine-grained and
flexible, even when the resource occupancy is high
⢠Just as thereâs room for sand in a full jar of rocks, thereâs room for us
⢠Joint project with AWS Scientific Computing team and ESnet
⢠Scoped out a pilot centered on representative HEP/ATLAS workflows
⢠AWS contributes precious technical expertise and credits for trial runs
⢠ESnet contributes expertise and network gear at the AWS/ESnet peering points
⢠ESnet participation is central to AWS waiving the egress fee (cond. apply)
⢠Which brings us to our new fine-grained data processing system
37
37. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
⢠Weâve leveraged new developments in our Workload Management System (PanDA),
our parallel software framework, powerful networking, and efficient I/O and storage to
implement a new approach to event processing â a fine-grained event service
⢠An extension to PanDA that allows it to manage event-level workloads (instead of
file level workloads where hundreds of events are clustered)
⢠Object stores (e.g. S3) provide highly scalable storage for many small event-
scale outputs
ďź Applicable to any workflow (not just HEP) able to support fine-grained
partitioning of the processing and its output
ďź Data-intensive, network-centric, platform-agnostic computing
⢠An increasingly important paradigm in the scientific computing community
38
39. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
ATLAS simulated event production is currently running on EC2 using
the event service
⢠PanDA âSiteâ at BNL sends jobs to EC2 Spot Market VMs
⢠Exercising scaling to >50k concurrent jobs, entering production soon
⢠Event Service maximizes return on short-lived job slots (~1h)
⢠Leverages capability from the BNL Tier 1 to elastically and transparently expand
workloads into cloud resources: after dedicated resources are fully utilized, jobs
overflow into the cloud to accommodate peak demands
40
40. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Using cloud resources effectively:
A policy-based cloud scheduler
Policy
Fully transparent to
Workload Management
System (e.g. PanDA),
Elastically expands
pool of compute
resources according
to user-defined policy
Demand-driven, policy-
based
programmatic
instantiation and
contraction of cloud
resources
41
41. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Elastic cluster: âFlexible and nimbleâ provisioning
Programmatically instantiates
Compute resources in the
Cloud
Designed to serve
- Peak demands
- Users without
dedicated
resources
- Dynamic creation
of
specific resource
types (e.g. DB,
storage, DTNs)
Goal: setup time <5%
of total compute time
HTCondor
HTCondor
42
42. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Architectural overview from the facility perspective
43
43. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
100G R&E
Exchange
Direct Connect
ESnet Pilot 2x10G
AWS Planned
100G to PNWG
Seattle
Direct Connect
ESnet Pilot 1x10G
Connecting AWS Facilities to the
Research Community
44
44. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Image Authoring and Runtime Configuration
Design goals:
⢠Useful for ATLAS, but usable by other VOs.
⢠Eliminate runtime RPM installation. Fatal with O(1000) startups.
⢠Images deterministically reproduceable. No snapshotting.
⢠Provide the ability for other users to do it themselves (make toolset public).
⢠Flexibility between build-time and runtime customization. Both options OK.
⢠Open source only. Only use functions/services for which open source equivalents exist (EC2,
S3).
⢠Off-the-shelf, non-cloud (Puppet, Hiera, Condor, Yum) wherever possible. Off-the-shelf cloud
(cloud-init, Imagefactory/Oz) only where needed.
⢠Keep custom parts small, simple, and/or optional.
⢠10,000 ft summary:
⢠Imagefactory 1.1.7 generates VMs from merged hierarchical templates.
⢠Masterless puppet consumes single Hiera file (injected via cloud-init write_file) at boot.
45
46. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Final remarks
⢠ATLAS has met the challenge of data-intensive computing at a scale not
seen before
⢠Resource virtualization - integration of storage, compute and network - in a
seamless manner, including cloud and local resources
⢠A rather complete and still growing set of AWS services to instantiate
VMs, allocate storage, and network dynamically
⢠New innovations like the Event Server allow ATLAS to efficiently harvest
EC2 spot market resources to meet its computing growth needs
⢠The joint project with the AWS Scientific Computing team and ESnet
has been crucial to the successful implementation
47
47. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Additional resources
⢠aws.amazon.com/hpc
⢠aws.amazon.com/big-data
⢠aws.amazon.com/grants
⢠aws.amazon.com/genomics
⢠aws.amazon.com/compliance
⢠aws.amazon.com/security
48. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Thank You.
This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Hinweis der Redaktion
Four main reasons why Amazon EMR
Letâs be more specific for a moment and talk about AutomotiveâŚ
http://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-processor-cores
http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Computational compound analysisSolar panel material Estimated computation time 264 years