This document discusses using Amazon Web Services for scientific computing. It describes how AWS provides scalable, cost-effective, and reliable cloud infrastructure that researchers can use on-demand. Specific AWS services that can benefit scientific applications are mentioned, including Amazon EC2 for flexible computing power, S3 for data storage, and Elastic MapReduce for hosted Hadoop services. Examples are given of scientific projects that have successfully used AWS, such as genome analysis and astrophysics simulations. The document argues that AWS enables new approaches to doing science by providing access to vast, on-demand computing resources and platforms for distributed applications and data sharing.
121. Crossbow: Rapid whole genome SNP analysis
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
Reduce: SoapSNP
Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10 (3): R25.
122. Crossbow
condenses
over
1,000
hours
of
resequencing
computa:on
into
a
few
hours
without
requiring
the
user
to
own
or
operate
a
computer
cluster
125. BLAT @ U. PENN
Map 100 million, 100 base paired end reads
Quad core with 5 GB of RAM would take 16 days
30 high-memory instances; 32 hours; $195
126. GALAXY MAPPING
Goal: Create an astrometric catalog of a billion
stars with micro arc second precision
Gaia satellite launched 2011; observations till
2017; catalog ready 2019
Problem: Single pass through the data for image
processing would take 30 years (on one CPU)
Solution: Use AWS
127. Capacity Capacity
Resources
Resources
Demand Demand
Time Time
Static data center Data center in the cloud
Unused resources
128. HEAVY-ION COLLISIONS
Problem: Quark matter physics conference
imminent but no compute resources handy
Solution: NIMBUS context broker allowed
researchers to provision 300 nodes and get the
simulations done