SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Accelerating Time to Science:
Transforming Research in the Cloud
Jamie Kinney - @jamiekinney
Director of Scientific Computing, a.k.a. “SciCo” – Amazon Web Services
Michael Franklin - @amplab
Director, AMPLab - UC Berkeley
Agenda
• An introduction to scientific computing on AWS
• How are researchers using AWS today?
• Case study: The UC Berkeley AMP Lab
• Q & A
What do we mean by Scientific Computing?
Scientific Computing refers to the application of simulation,
mathematical modeling and quantitative analysis to analyze and
solve scientific problems.
How is AWS Used for Scientific Computing?
• High Performance Computing (HPC) for Engineering and Simulation
• High Throughput Computing (HTC) for Data-Intensive Analytics
• Hybrid Supercomputing centers
• Collaborative Research Environments
• Citizen Science
• Science-as-a-Service
Why do researchers love using AWS?
Time to Science
Access research
infrastructure in minutes
Low Cost
Pay-as-you-go pricing
Elastic
Easily add or remove capacity
Globally Accessible
Easily Collaborate with
researchers around the world
Secure
A collection of tools to
protect data and privacy
Scalable
Access to effectively
limitless capacity
Why does AWS care about Scientific Computing?
• We want to improve our world by accelerating the pace of scientific
discovery
• It is a great application of AWS with a broad customer base
• The scientific community helps us innovate on behalf of all customers
– Streaming data processing & analytics
– Exabyte scale data management solutions and exaflop scale compute
– Collaborative research tools and techniques
– New AWS regions
– Significant advances in low-power compute, storage and data centers
– Efficiencies which will lower our costs and therefore pricing for all customers
Research Grants
AWS provides free usage
credits to help researchers:
• Teach advanced courses
• Explore new projects
• Create resources for the
scientific community
aws.amazon.com/grants
Peering with all global research networks
Image courtesy John Hover - Brookhaven National Lab
Breaking news! Restricted-access genomics on
AWS
aws.amazon.com/genomics
How are researchers using AWS today?
High Throughput Computing at Scale
The Large Hadron Collider
@ CERN includes 6,000+
researchers from over 40
countries and produces
approximately 25PB of data
each year.
The ATLAS and CMS
experiments are using AWS
for Monte Carlo simulations
and analysis of LHC data.
Data-Intensive Computing
The Square Kilometer Array will link 250,000 radio
telescopes together, creating the world’s most
sensitive telescope. The SKA will generate zettabytes
of raw data, publishing exabytes annually over 30-40
years.
Researchers are using AWS to develop and test:
• Data processing pipelines
• Image visualization tools
• Exabyte-scale research data management
• Collaborative research environments
aws.amazon.com/solutions/case-studies/icrar/
High Performance Computing
Simulations in the Automotive Sector
• Crash and materials simulations
• Fluid and thermal dynamics simulations
• Car body aerodynamics
• Electronics and electromagnetic simulations
Honda materials science simulations on AWS:
• Deploying scalable HPC clusters on AWS Spot – up to 1000 C3 instances
• Running more simulations than before, for more accurate results
“Cloud offers us an opportunity, as we can innovate faster than before.”
- Ayumi Tada, IT System Administrator, Honda R&D
Schrodinger & Cycle Computing:
Computational Chemistry for Better Solar Power
Simulation by Mark Thompson of the
University of Southern California to see
which of 205,000 organic compounds
could be used for photovoltaic cells for
solar panel material.
Estimated computation time 264 years
completed in 18 hours.
• 156,314 core cluster, 8 regions
• 1.21 petaflops (Rpeak)
• $33,000 or 16¢ per molecule
Loosely
Coupled
Science-as-a-Service
Globus Genomics, DNAnexus, and SevenBridges Genomics offer inexpensive, easy-
to-use, and secure platforms for processing and analyzing genomic data.
The Weather Company pushes four gigabytes of data to AWS
each second in order to delivers 15 billion forecasts each day
to their customers around the world.
aws.amazon.com/solutions/case-studies/the-weather-company/
Citizen Science
The Asteroid Data Hunters competition used AWS to develop better mechanisms for
finding near-Earth asteroids. The top algorithm is 18% better at finding asteroids!
Case Study: The UC Berkeley AMP Lab
Scalable Data-Driven
Science at the AMPLab
UC BERKELEY
Michael Franklin
April 9, 2015
AWS Summit SF
AMPLab Overview
• 80+ Students, Postdocs, Faculty and Staff from:
Databases, Machine Learning, Systems, Security, and Networking
• 28 Industry Sponsors +
White House Big Data Program:
NSF CISE Expeditions in Computing and Darpa XData
• Founding Sponsors:
“… Berkeley’s AMPLab has already left an indelible mark on the world of
information technology, and even the web. But we haven’t yet experienced
the full impact of the group … Not even close.”
– Derrick Harris, GigaOM, Aug 2, 2014
Franklin Jordan Stoica Patterson ShenkerRechtKatzJosephGoldbergCuller
AMPLab: Integrating 3
Resources
Algorithms
• Machine Learning, Statistical Methods
• Prediction, Business Intelligence
Machines
• Clusters and Clouds
• Warehouse Scale Computing
People
• Crowdsourcing, Human Computation
• Data Scientists, Analysts
Berkeley Data Analytics
Stack
(Apache and BSD open source)
Resource
Virtualization
Storage
Processing
Engine
Access and
Interfaces
In-house
Apps
Open Source Community Building
MeetUp on MLbase @Twitter (Aug 6, 2013)
Spark Summit SF (June 30, 2014)
Apps: Genomics Patterson et al.
Using BDAS, SNAP (Scalable Nucleotide
Alignment) aligns in minutes vs. days
Why Speed Matters: A real-world use case
ADAM – Data formats and Processing
Patterns for Genomics on Big Data Platforms
(e.g., Spark)
Collaborations with: UCSF, UCSC, OHSU,
Microsoft Research, Mt. Sinai
M. Wilson, …, and C. Chiu, “Actionable Diagnosis of Neuroleptospirosis by Next-Generation Sequencing”,
June 4, 2014, New England Journal of Medicine.
SNA
Carat Collaborative Battery App
24
750,000+
downloads
Big Data Ecosystem
Evolution
MapReduce
Pregel
Dremel
GraphLab
Storm
Giraph
Drill Tez
Impala
S4 …
Specialized systems
(iterative, interactive and
streaming apps)
General batch
processing
AMPLab Unification Philosophy
Don’t specialize MapReduce – generalize it!
Two additions to Hadoop MR can enable all the models shown earlier!
1. General Task DAGs
2. Data Sharing
For Users:
Fewer Systems to Use
Less Data Movement
Spark
Streaming
GraphX
…
SparkSQL
MLbase
In-Memory
Dataflow
System
M. Zaharia, M. Choudhury, M. Franklin, I. Stoica, S. Shenker, “Spark: Cluster Computing with Working Sets, USENIX HotCloud, 2010.
“It’s only September but it’s already clear that 2014 will
be the year of Apache Spark”
-- Datanami, 9/15/14
• Developed in AMPLab and its predecessor the RADLab
• Alternative to Hadoop MapReduce
• 10-100x speedup for ML and interactive queries
• Central component of the BDAS Stack
• “Graduated” to Apache Foundation -> Apache Spark
Apache Spark Contributors:
0
25
50
75
100
2011 2012 2013 2014
400+ contributors to current release
Apache Spark:
Compared to Other Projects
MapReduce
YARN
HDFS
Storm
Spark
0
500
1000
1500
2000
MapReduce
YARN
HDFS
Storm
Spark
0
50000
100000
150000
200000
250000
300000
350000
Commits Lines of Code Changed
Activity in past 6 months
2-3x more activity than: Hadoop, Storm, MongoDB, NumPy, D3,
Julia, …
Iteration in MapReduce
Training
Data
Map Reduce Learned
Model
w(1)
w(2)
w(3)
w(0)
Initial
Model
Cost of Iteration in MapReduce
Map Reduce Learned
Model
w(1)
w(2)
w(3)
w(0)
Initial
Model
Training
Data
Read 2
Repeatedly
load same data
Cost of Iteration in MapReduce
Map Reduce Learned
Model
w(1)
w(2)
w(3)
w(0)
Initial
Model
Training
DataRedundantly save
output between
stages
Dataflow View
Training
Data
(HDFS)
Map
Reduc
e
Map
Reduc
e
Map
Reduc
e
Memory Opt. Dataflow
Training
Data
(HDFS)
Map
Reduc
e
Map
Reduc
e
Map
Reduc
e
Cached
Load
Memory Opt. Dataflow View
Training
Data
(HDFS)
Map
Reduc
e
Map
Reduc
e
Map
Reduc
e
Efficiently
move data
between
stages
Spark:10-100× faster than Hadoop MapReduce
Resilient Distributed Datasets (RDDs)
API: coarse-grained transformations (map, group-by, join, sort, filter,
sample,…) on immutable collections
Resilient Distributed Datasets (RDDs)
» Collections of objects that can be stored in memory or disk across a cluster
» Built via parallel transformations (map, filter, …)
» Automatically rebuilt on failure
Rich enough to capture many models:
» Data flow models: MapReduce, Dryad, SQL, …
» Specialized models: Pregel, Hama, …
M. Zaharia, et al, Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing, NSDI 2012.
Abstraction: Dataflow Operators
map
filter
groupBy
sort
union
join
leftOuterJoin
rightOuterJoin
reduce
count
fold
reduceByKey
groupByKey
cogroup
cross
zip
sample
take
first
partitionBy
mapWith
pipe
save
...
Apache Spark v1.3 (3/15)
Includes
» Spark (core)
» Spark Streaming
» GraphX
» MLlib
» Spark SQL – Query Processing
Wide range of interfaces:
» Enhanced Dataframes API
» Python / interactive ipython
» Scala / interactive scala shell
» R / interactive R-shell
» Java
Now included in all major Hadoop distributions
Data Intensive Genomics
New population-scale experiments will sequence 10-100k
samples
• 100k samples @ 60x WGS will generate ~20PB of read data and
~300TB of genotype data
End-to-end pipeline latency is important to clinical work
We want to jointly analyze samples to uncover low
frequency variations
How can we improve analysis
productivity?
Flat file formats sacrifice interoperability but do not improve
performance
Common sort order invariants imposed by tools compromise
correctness
Genomics APIs tend to be at a lower level of abstraction, which
compromises productivity
ADAM
An open source, high performance, distributed platform for genomic
analysis
ADAM defines a:
1. Data schema and layout on disk*
2. Programming interface for distributed processing of genomic
data**
3. Command line interface
* Via Parquet and Avro
** Work on Python integration is underway
Data Model is the "Narrow Waist"
Data Format
Schema can be updated without
breaking backwards compatibility
Normalize metadata fields into schema
for O(1) metadata access
Models are “dumb”; enhance as
necessary with rich objects
record AlignmentRecord {
union { null, Contig } contig = null;
union { null, long } start = null;
union { null, long } end = null;
union { null, int } mapq = null;
union { null, string } readName = null;
union { null, string } sequence = null;
union { null, string } mateReference = null;
union { null, long } mateAlignmentStart = null;
union { null, string } cigar = null;
union { null, string } qual = null;
union { null, string } recordGroupName = null;
union { int, null } basesTrimmedFromStart = 0;
union { int, null } basesTrimmedFromEnd = 0;
union { boolean, null } readPaired = false;
union { boolean, null } properPair = false;
union { boolean, null } readMapped = false;
union { boolean, null } mateMapped = false;
union { boolean, null } firstOfPair = false;
union { boolean, null } secondOfPair = false;
union { boolean, null } failedVendorQualityChecks = false;
union { boolean, null } duplicateRead = false;
union { boolean, null } readNegativeStrand = false;
union { boolean, null } mateNegativeStrand = false;
union { boolean, null } primaryAlignment = false;
union { boolean, null } secondaryAlignment = false;
union { boolean, null } supplementaryAlignment = false;
union { null, string } mismatchingPositions = null;
union { null, string } origQual = null;
union { null, string } attributes = null;
union { null, string } recordGroupSequencingCenter = null;
union { null, string } recordGroupDescription = null;
union { null, long } recordGroupRunDateEpoch = null;
union { null, string } recordGroupFlowOrder = null;
union { null, string } recordGroupKeySequence = null;
union { null, string } recordGroupLibrary = null;
union { null, int } recordGroupPredictedMedianInsertSize = null;
union { null, string } recordGroupPlatform = null;
union { null, string } recordGroupPlatformUnit = null;
union { null, string } recordGroupSample = null;
union { null, Contig } mateContig = null;
}
Schemas at https://www.github.com/bigdatagenomics/bdg-formats
Parquet: A Modern Big Data Storage
Format
ASF Incubator project, based on Google
Dremel
High performance columnar store with
support for projections and push-down
predicates
Short read data stored in Parquet achieves a
25% improvement in size over compressed
BAM
Enables scale-out using modern Big Data
technology (e.g., Spark)
Image from Parquet format definition: https://www.github.com/apache/incubator-parquet-format
ADAM’s API
ADAM is built on top of Apache Spark, which provides the RDD
abstraction —> distributed arrays
Common primitives include:
• Aggregates: BQSR, Indel Realignment
• Bucketing: Duplicate Marking, Concordance
• Region Joins: Variant Calling and Filtration
Adam Performance Bottom Line
F. Nothaft, et. al., “Rethinking Data-Intensive
Science Using Scalable Analytics Systems”,
ACM SIGMOD Conf., June 2015, to appear.
$214.39
$78.92
ADAM Performance Update
Analysis run using Amazon EC2, single node was hs1.8xlarge, cluster was m2.4xlarge
Scripts available at https://www.github.com/fnothaft/bdg-recipes.git, “sigmod" branch
Achieve linear scalability out to
128 nodes for most tasks
2-4x improvement over {GATK,
samtools,Picard} on single node
Scalable Analytics for Science
Data Model is the “narrow waist” of the architecture
Modern “NoSQL” models support evolution and heterogeneity with high
performance.
BDAS Declarative Analytics: Specify What not How
MLBase chooses:
• Algorithms/Operators
• Ordering and Physical Placement
• Parameter and Hyperparameter Settings
• Featurization
Leverages BDAS (Spark, GraphX, Tachyon) and Hadoop File System
for Speed and Scale
To find out more or get
involved:
amplab.berkeley.edu
franklin@berkeley.edu
UC BERKELEY
Thanks to NSF CISE Expeditions in Computing, DARPA XData,
Founding Sponsors: Amazon Web Services, Google, and SAP,
the Thomas and Stacy Siebel Foundation,
all our industrial sponsors and partners, and all the members of the AMPLab Team.
Additional resources…
• aws.amazon.com/hpc
• aws.amazon.com/big-data
• aws.amazon.com/grants
• aws.amazon.com/genomics
• aws.amazon.com/compliance
• aws.amazon.com/security
Thank you!
Jamie Kinney
jkinney@amazon.com
@jamiekinney
SAN FRANCISCO
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Weitere ähnliche Inhalte

Was ist angesagt?

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010Ian Foster
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC Geoffrey Fox
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeLiana Ye
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Transitioning Geoscience Research to the Cloud: Opportunities and Challenges
Transitioning Geoscience Research to the Cloud: Opportunities and ChallengesTransitioning Geoscience Research to the Cloud: Opportunities and Challenges
Transitioning Geoscience Research to the Cloud: Opportunities and ChallengesAmazon Web Services
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 

Was ist angesagt? (20)

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No Code
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Transitioning Geoscience Research to the Cloud: Opportunities and Challenges
Transitioning Geoscience Research to the Cloud: Opportunities and ChallengesTransitioning Geoscience Research to the Cloud: Opportunities and Challenges
Transitioning Geoscience Research to the Cloud: Opportunities and Challenges
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 

Andere mochten auch

Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudJamie Kinney
 
Next-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC IntegrationNext-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC IntegrationAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
AWS Summit Auckland- Developing Applications for IoT
AWS Summit Auckland-  Developing Applications for IoTAWS Summit Auckland-  Developing Applications for IoT
AWS Summit Auckland- Developing Applications for IoTAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoAmazon Web Services
 
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS Amazon Web Services
 
Hack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 ThreatsHack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 ThreatsAmazon Web Services
 
AWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - VocusAWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - VocusAmazon Web Services
 
Sony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS CloudSony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS CloudAmazon Web Services
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAmazon Web Services
 
Expanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud InfrastructureExpanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud InfrastructureAmazon Web Services
 
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity OptionsAmazon Web Services
 
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...Amazon Web Services
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion PacketsAmazon Web Services
 
Grow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS CloudGrow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS CloudAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAmazon Web Services
 

Andere mochten auch (20)

Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
 
Next-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC IntegrationNext-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC Integration
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
AWS Summit Auckland- Developing Applications for IoT
AWS Summit Auckland-  Developing Applications for IoTAWS Summit Auckland-  Developing Applications for IoT
AWS Summit Auckland- Developing Applications for IoT
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
 
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
Hack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 ThreatsHack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 Threats
 
AWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - VocusAWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - Vocus
 
Sony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS CloudSony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS Cloud
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
Expanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud InfrastructureExpanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud Infrastructure
 
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion Packets
 
Grow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS CloudGrow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS Cloud
 
S'étendre à l'international
S'étendre à l'internationalS'étendre à l'international
S'étendre à l'international
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS Cloud
 

Ähnlich wie Time to Science/Time to Results: Transforming Research in the Cloud

Scientific
Scientific Scientific
Scientific marpierc
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Scaling Ideas: Accelerating Research with AWS - Technical 301
Scaling Ideas: Accelerating Research with AWS - Technical 301Scaling Ideas: Accelerating Research with AWS - Technical 301
Scaling Ideas: Accelerating Research with AWS - Technical 301Amazon Web Services
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesJamie Kinney
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudAmazon Web Services
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewPlanet OS
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Amazon Web Services
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...Amazon Web Services
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Shadab Ali Khan
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 

Ähnlich wie Time to Science/Time to Results: Transforming Research in the Cloud (20)

Scientific
Scientific Scientific
Scientific
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Scaling Ideas: Accelerating Research with AWS - Technical 301
Scaling Ideas: Accelerating Research with AWS - Technical 301Scaling Ideas: Accelerating Research with AWS - Technical 301
Scaling Ideas: Accelerating Research with AWS - Technical 301
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite Cloud
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) Overview
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Time to Science/Time to Results: Transforming Research in the Cloud

  • 1. Accelerating Time to Science: Transforming Research in the Cloud Jamie Kinney - @jamiekinney Director of Scientific Computing, a.k.a. “SciCo” – Amazon Web Services Michael Franklin - @amplab Director, AMPLab - UC Berkeley
  • 2. Agenda • An introduction to scientific computing on AWS • How are researchers using AWS today? • Case study: The UC Berkeley AMP Lab • Q & A
  • 3. What do we mean by Scientific Computing? Scientific Computing refers to the application of simulation, mathematical modeling and quantitative analysis to analyze and solve scientific problems.
  • 4. How is AWS Used for Scientific Computing? • High Performance Computing (HPC) for Engineering and Simulation • High Throughput Computing (HTC) for Data-Intensive Analytics • Hybrid Supercomputing centers • Collaborative Research Environments • Citizen Science • Science-as-a-Service
  • 5. Why do researchers love using AWS? Time to Science Access research infrastructure in minutes Low Cost Pay-as-you-go pricing Elastic Easily add or remove capacity Globally Accessible Easily Collaborate with researchers around the world Secure A collection of tools to protect data and privacy Scalable Access to effectively limitless capacity
  • 6. Why does AWS care about Scientific Computing? • We want to improve our world by accelerating the pace of scientific discovery • It is a great application of AWS with a broad customer base • The scientific community helps us innovate on behalf of all customers – Streaming data processing & analytics – Exabyte scale data management solutions and exaflop scale compute – Collaborative research tools and techniques – New AWS regions – Significant advances in low-power compute, storage and data centers – Efficiencies which will lower our costs and therefore pricing for all customers
  • 7. Research Grants AWS provides free usage credits to help researchers: • Teach advanced courses • Explore new projects • Create resources for the scientific community aws.amazon.com/grants
  • 8. Peering with all global research networks Image courtesy John Hover - Brookhaven National Lab
  • 9. Breaking news! Restricted-access genomics on AWS aws.amazon.com/genomics
  • 10. How are researchers using AWS today?
  • 11. High Throughput Computing at Scale The Large Hadron Collider @ CERN includes 6,000+ researchers from over 40 countries and produces approximately 25PB of data each year. The ATLAS and CMS experiments are using AWS for Monte Carlo simulations and analysis of LHC data.
  • 12. Data-Intensive Computing The Square Kilometer Array will link 250,000 radio telescopes together, creating the world’s most sensitive telescope. The SKA will generate zettabytes of raw data, publishing exabytes annually over 30-40 years. Researchers are using AWS to develop and test: • Data processing pipelines • Image visualization tools • Exabyte-scale research data management • Collaborative research environments aws.amazon.com/solutions/case-studies/icrar/
  • 13. High Performance Computing Simulations in the Automotive Sector • Crash and materials simulations • Fluid and thermal dynamics simulations • Car body aerodynamics • Electronics and electromagnetic simulations Honda materials science simulations on AWS: • Deploying scalable HPC clusters on AWS Spot – up to 1000 C3 instances • Running more simulations than before, for more accurate results “Cloud offers us an opportunity, as we can innovate faster than before.” - Ayumi Tada, IT System Administrator, Honda R&D
  • 14. Schrodinger & Cycle Computing: Computational Chemistry for Better Solar Power Simulation by Mark Thompson of the University of Southern California to see which of 205,000 organic compounds could be used for photovoltaic cells for solar panel material. Estimated computation time 264 years completed in 18 hours. • 156,314 core cluster, 8 regions • 1.21 petaflops (Rpeak) • $33,000 or 16¢ per molecule Loosely Coupled
  • 15. Science-as-a-Service Globus Genomics, DNAnexus, and SevenBridges Genomics offer inexpensive, easy- to-use, and secure platforms for processing and analyzing genomic data. The Weather Company pushes four gigabytes of data to AWS each second in order to delivers 15 billion forecasts each day to their customers around the world. aws.amazon.com/solutions/case-studies/the-weather-company/
  • 16. Citizen Science The Asteroid Data Hunters competition used AWS to develop better mechanisms for finding near-Earth asteroids. The top algorithm is 18% better at finding asteroids!
  • 17. Case Study: The UC Berkeley AMP Lab
  • 18. Scalable Data-Driven Science at the AMPLab UC BERKELEY Michael Franklin April 9, 2015 AWS Summit SF
  • 19. AMPLab Overview • 80+ Students, Postdocs, Faculty and Staff from: Databases, Machine Learning, Systems, Security, and Networking • 28 Industry Sponsors + White House Big Data Program: NSF CISE Expeditions in Computing and Darpa XData • Founding Sponsors: “… Berkeley’s AMPLab has already left an indelible mark on the world of information technology, and even the web. But we haven’t yet experienced the full impact of the group … Not even close.” – Derrick Harris, GigaOM, Aug 2, 2014 Franklin Jordan Stoica Patterson ShenkerRechtKatzJosephGoldbergCuller
  • 20. AMPLab: Integrating 3 Resources Algorithms • Machine Learning, Statistical Methods • Prediction, Business Intelligence Machines • Clusters and Clouds • Warehouse Scale Computing People • Crowdsourcing, Human Computation • Data Scientists, Analysts
  • 21. Berkeley Data Analytics Stack (Apache and BSD open source) Resource Virtualization Storage Processing Engine Access and Interfaces In-house Apps
  • 22. Open Source Community Building MeetUp on MLbase @Twitter (Aug 6, 2013) Spark Summit SF (June 30, 2014)
  • 23. Apps: Genomics Patterson et al. Using BDAS, SNAP (Scalable Nucleotide Alignment) aligns in minutes vs. days Why Speed Matters: A real-world use case ADAM – Data formats and Processing Patterns for Genomics on Big Data Platforms (e.g., Spark) Collaborations with: UCSF, UCSC, OHSU, Microsoft Research, Mt. Sinai M. Wilson, …, and C. Chiu, “Actionable Diagnosis of Neuroleptospirosis by Next-Generation Sequencing”, June 4, 2014, New England Journal of Medicine. SNA
  • 24. Carat Collaborative Battery App 24 750,000+ downloads
  • 25. Big Data Ecosystem Evolution MapReduce Pregel Dremel GraphLab Storm Giraph Drill Tez Impala S4 … Specialized systems (iterative, interactive and streaming apps) General batch processing
  • 26. AMPLab Unification Philosophy Don’t specialize MapReduce – generalize it! Two additions to Hadoop MR can enable all the models shown earlier! 1. General Task DAGs 2. Data Sharing For Users: Fewer Systems to Use Less Data Movement Spark Streaming GraphX … SparkSQL MLbase
  • 27. In-Memory Dataflow System M. Zaharia, M. Choudhury, M. Franklin, I. Stoica, S. Shenker, “Spark: Cluster Computing with Working Sets, USENIX HotCloud, 2010. “It’s only September but it’s already clear that 2014 will be the year of Apache Spark” -- Datanami, 9/15/14 • Developed in AMPLab and its predecessor the RADLab • Alternative to Hadoop MapReduce • 10-100x speedup for ML and interactive queries • Central component of the BDAS Stack • “Graduated” to Apache Foundation -> Apache Spark
  • 28. Apache Spark Contributors: 0 25 50 75 100 2011 2012 2013 2014 400+ contributors to current release
  • 29. Apache Spark: Compared to Other Projects MapReduce YARN HDFS Storm Spark 0 500 1000 1500 2000 MapReduce YARN HDFS Storm Spark 0 50000 100000 150000 200000 250000 300000 350000 Commits Lines of Code Changed Activity in past 6 months 2-3x more activity than: Hadoop, Storm, MongoDB, NumPy, D3, Julia, …
  • 30. Iteration in MapReduce Training Data Map Reduce Learned Model w(1) w(2) w(3) w(0) Initial Model
  • 31. Cost of Iteration in MapReduce Map Reduce Learned Model w(1) w(2) w(3) w(0) Initial Model Training Data Read 2 Repeatedly load same data
  • 32. Cost of Iteration in MapReduce Map Reduce Learned Model w(1) w(2) w(3) w(0) Initial Model Training DataRedundantly save output between stages
  • 35. Memory Opt. Dataflow View Training Data (HDFS) Map Reduc e Map Reduc e Map Reduc e Efficiently move data between stages Spark:10-100× faster than Hadoop MapReduce
  • 36. Resilient Distributed Datasets (RDDs) API: coarse-grained transformations (map, group-by, join, sort, filter, sample,…) on immutable collections Resilient Distributed Datasets (RDDs) » Collections of objects that can be stored in memory or disk across a cluster » Built via parallel transformations (map, filter, …) » Automatically rebuilt on failure Rich enough to capture many models: » Data flow models: MapReduce, Dryad, SQL, … » Specialized models: Pregel, Hama, … M. Zaharia, et al, Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing, NSDI 2012.
  • 38. Apache Spark v1.3 (3/15) Includes » Spark (core) » Spark Streaming » GraphX » MLlib » Spark SQL – Query Processing Wide range of interfaces: » Enhanced Dataframes API » Python / interactive ipython » Scala / interactive scala shell » R / interactive R-shell » Java Now included in all major Hadoop distributions
  • 39. Data Intensive Genomics New population-scale experiments will sequence 10-100k samples • 100k samples @ 60x WGS will generate ~20PB of read data and ~300TB of genotype data End-to-end pipeline latency is important to clinical work We want to jointly analyze samples to uncover low frequency variations
  • 40. How can we improve analysis productivity? Flat file formats sacrifice interoperability but do not improve performance Common sort order invariants imposed by tools compromise correctness Genomics APIs tend to be at a lower level of abstraction, which compromises productivity
  • 41. ADAM An open source, high performance, distributed platform for genomic analysis ADAM defines a: 1. Data schema and layout on disk* 2. Programming interface for distributed processing of genomic data** 3. Command line interface * Via Parquet and Avro ** Work on Python integration is underway
  • 42. Data Model is the "Narrow Waist"
  • 43. Data Format Schema can be updated without breaking backwards compatibility Normalize metadata fields into schema for O(1) metadata access Models are “dumb”; enhance as necessary with rich objects record AlignmentRecord { union { null, Contig } contig = null; union { null, long } start = null; union { null, long } end = null; union { null, int } mapq = null; union { null, string } readName = null; union { null, string } sequence = null; union { null, string } mateReference = null; union { null, long } mateAlignmentStart = null; union { null, string } cigar = null; union { null, string } qual = null; union { null, string } recordGroupName = null; union { int, null } basesTrimmedFromStart = 0; union { int, null } basesTrimmedFromEnd = 0; union { boolean, null } readPaired = false; union { boolean, null } properPair = false; union { boolean, null } readMapped = false; union { boolean, null } mateMapped = false; union { boolean, null } firstOfPair = false; union { boolean, null } secondOfPair = false; union { boolean, null } failedVendorQualityChecks = false; union { boolean, null } duplicateRead = false; union { boolean, null } readNegativeStrand = false; union { boolean, null } mateNegativeStrand = false; union { boolean, null } primaryAlignment = false; union { boolean, null } secondaryAlignment = false; union { boolean, null } supplementaryAlignment = false; union { null, string } mismatchingPositions = null; union { null, string } origQual = null; union { null, string } attributes = null; union { null, string } recordGroupSequencingCenter = null; union { null, string } recordGroupDescription = null; union { null, long } recordGroupRunDateEpoch = null; union { null, string } recordGroupFlowOrder = null; union { null, string } recordGroupKeySequence = null; union { null, string } recordGroupLibrary = null; union { null, int } recordGroupPredictedMedianInsertSize = null; union { null, string } recordGroupPlatform = null; union { null, string } recordGroupPlatformUnit = null; union { null, string } recordGroupSample = null; union { null, Contig } mateContig = null; } Schemas at https://www.github.com/bigdatagenomics/bdg-formats
  • 44. Parquet: A Modern Big Data Storage Format ASF Incubator project, based on Google Dremel High performance columnar store with support for projections and push-down predicates Short read data stored in Parquet achieves a 25% improvement in size over compressed BAM Enables scale-out using modern Big Data technology (e.g., Spark) Image from Parquet format definition: https://www.github.com/apache/incubator-parquet-format
  • 45. ADAM’s API ADAM is built on top of Apache Spark, which provides the RDD abstraction —> distributed arrays Common primitives include: • Aggregates: BQSR, Indel Realignment • Bucketing: Duplicate Marking, Concordance • Region Joins: Variant Calling and Filtration
  • 46. Adam Performance Bottom Line F. Nothaft, et. al., “Rethinking Data-Intensive Science Using Scalable Analytics Systems”, ACM SIGMOD Conf., June 2015, to appear. $214.39 $78.92
  • 47. ADAM Performance Update Analysis run using Amazon EC2, single node was hs1.8xlarge, cluster was m2.4xlarge Scripts available at https://www.github.com/fnothaft/bdg-recipes.git, “sigmod" branch Achieve linear scalability out to 128 nodes for most tasks 2-4x improvement over {GATK, samtools,Picard} on single node
  • 48. Scalable Analytics for Science Data Model is the “narrow waist” of the architecture Modern “NoSQL” models support evolution and heterogeneity with high performance. BDAS Declarative Analytics: Specify What not How MLBase chooses: • Algorithms/Operators • Ordering and Physical Placement • Parameter and Hyperparameter Settings • Featurization Leverages BDAS (Spark, GraphX, Tachyon) and Hadoop File System for Speed and Scale
  • 49. To find out more or get involved: amplab.berkeley.edu franklin@berkeley.edu UC BERKELEY Thanks to NSF CISE Expeditions in Computing, DARPA XData, Founding Sponsors: Amazon Web Services, Google, and SAP, the Thomas and Stacy Siebel Foundation, all our industrial sponsors and partners, and all the members of the AMPLab Team.
  • 50. Additional resources… • aws.amazon.com/hpc • aws.amazon.com/big-data • aws.amazon.com/grants • aws.amazon.com/genomics • aws.amazon.com/compliance • aws.amazon.com/security
  • 52. SAN FRANCISCO ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved