In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run.
"For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule."
Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Watch the video presentation: http://wp.me/p3RLHQ-aO9
Strategies for Landing an Oracle DBA Job as a Fresher
Cycle Computing Record-breaking Petascale HPC Run
1. Record-breaking Petascale CycleCloud
HPC Production Run
156,000-core Cluster (1.21PetaFLOPS) Accelerates
Schrödinger Materials Science and Green Energy
November 2013
Cycle Computing
3. Records broken, Science done
On November 3rd, ran a “MegaRun” cluster that had:
• 156,314 cores and 1.21 PetaFLOPS of theoretical peak compute power
• Ran 2.3 Million hours, totaling 264 years of computing, in 18 hours
• Executed world-wide, across all 8 public AWS Regions (5 continents)
• Compared to $68Million to purchase – done on CycleCloud with Spot Instances for just $33K
THE SCIENCE
• Finding Organic Photovoltaic Compounds that are more efficient, easier to manufacture to help remove
the US’s reliance on fossil fuels.
• Designing, synthesizing, and experimenting with a new material can take 1 year of a scientists time
requiring hundreds of thousands of dollars in equipment, chemicals, etc. or
With Schrödinger Materials Science’s tools, on Cycle and AWS Spot Instances, it cost $0.16 per molecule
• The run analyzed 205,000 compounds in total
• This is the exact kind of science being outlined in the Materials Genome initiative from the White House
4. Challenge of Materials Science
Traditional Materials Design
• Design, Synthesis, Analysis are challenging
for an arbitrary material
•
Low hit rate for viable materials
•
Total Molecule Cost:
•
Time: A year for a grad student
•
$100,000s in equipment, chemicals, etc.
With Schrödinger Computational Chemistry &
Cycle
• Schrödinger Materials Science tools simulate
accurate properties in hours
•
Simulation guides the researcher’s intuition
•
Focus physical analysis on promising
materials
•
Total cost:
•
Time to enumerate molecules: Minutes/
hours
•
$0.16 per molecule in infrastructure using
AWS Spot Instances
5. Designing Solar Materials
The Challenge is efficiency
• Need to efficiently turn photons from the sun to Electricity
The number of possible materials is limitless
• Need to separate the right compounds from the useless ones
• If the 20th century was the century of silicon, the 21st will be all
organic
How do we find the right material,
without spending the entire 21st century looking for it?
6. The Challenge for the Scientist
Dr. Mark Thompson
Professor of Chemistry, USC
“Solar energy has the potential to replace
some of our dependence on fossil fuels,
but only if the solar panels can be made
very inexpensively and have
reasonable to high efficiencies. Organic
solar cells have this potential.”
Challenge: run a virtual screen of 205,000
molecules in continuing analysis of
possible materials for organic solar
cells
7. The right needle in the right hay stack
Before: Trade-off between
compute time vs. sampling
Coarse
screen,
Small
samples
Now: Better analysis, more
materials
è Better results
Higher
Quality
Analysis,
More
materials
More
Materials
More
Materials
8. Solution: Utility HPC
On-demand compute power is transformative for users, but hard to make production
— Big Opportunity to help Manufacturing, Life Science, Energy, Financial
companies:
— Rise of BigData, compute, Monte Carlo problems that power modern business and
science
— Applications, like Schrödinger Materials Science tools, offer a compelling alternative
to physically testing products
— Amazon Web Services makes infrastructure easily accessible
— AWS Spot instances decrease the cost of compute
— Science & engineering face faster time-to-market, increased agility requirements
— Capital efficiency (OpEx replacing CapEx) are organizational goals
9. Why isn’t everyone doing this?
Because it is really complicated, and really hard to orchestrate
technical applications, securely, at scale
We’re the first and only ones doing this including the wellpublicized:
2000, 4000, 10000, 30000, and 50000 core clusters in
2010-2013
Clients including: Johnson & Johnson, Schrödinger, Pfizer,
Novartis, Genentech, HGST, Pacific Life Insurance, Hartford
Insurance Group …
10. Cycle Computing Makes Utility HPC a Reality
Easily orchestrates complex workloads
and data access to local and Cloud HPC
— Scales from 100-1,000,000 cores
— Handles errors, reliability
— Schedules data movement
— Secures, encrypts and audits
— Provides reporting and chargeback
— Automates spot bidding
— Supports Enterprise operations
12. Solution: “MegaRun” Cluster
New record: MegaRun is the largest dedicated Cloud
HPC Cluster to date on Public Cloud
Tool
Description
Schrödinger
Materials
Science
tools
Set
of
automated
workflows
that
enable
organic
semiconductor
materials
to
be
simulated
accurately
CycleCloud
HPC
clusters
at
small
to
massive
scale:
application
deployment,
job/data
aware
routing,
error-‐handling
Jupiter
Cycle’s
massively
scalable,
resilient
cloud
scheduler
Chef
Automated
configuration
at
scale
Multi-‐Region
AWS
Spot
Instances
Massive
server
resource
capacity
across
all
public
regions
of
AWS
17. Jupiter Scheduler
— Make large cloud regions work together
— Spans many regions/datacenters to resiliently route
work with minimal scheduling overhead
— Batch/MPI Schedulers get 10k cores doing 100k jobs
— Jupiter seeks to get Millions of cores doing 10Ms tasks
— Currently 100k’s cores doing 1M tasks on large runs
— Can survive machine, availability zone, and region
failure while still executing the full workload
19. MegaRun – Facts and Figures
Metric
Count
Compute Hours of Work
2,312,959 hours
Compute Days of Work
96,373 days
Compute Years of Work
264 years
Molecule Count
205,000 materials
Run Time
< 18 hours
Max Scale (cores)
156,314 cores across 8 regions
Max Scale (instances)
16,788 instances
20. Accelerated Time to Result
Cluster Scale
Cost
Run-time
156,000 core CycleCloud
$33,000
~ 18 hours
300-core Internal cluster
(stopping all other work)
$132,000
~ 10.5 months