SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Science as a Service
Ian Foster, The University of Chicago and Argonne National Laboratory
November 14, 2013
A time of disruptive change
A time of disruptive change
Most labs have limited resources
Heidorn: NSF grants in 2007
$1,000,000
$100,000
$10,000

< $350,000
80% of awards
50% of grant $$

$1,000

2000

4000

6000

8000
Automation is required to
apply more sophisticated
methods to far more data
Automation is required to
apply more sophisticated
methods to far more data
Outsourcing is needed to
achieve economies of scale
in the use of automated
methods
Building a discovery cloud
• Identify time-consuming activities amenable to
automation and outsourcing
• Implement as high-quality, low-touch SaaS
• Leverage IaaS for reliability,
Software as a service
economies of scale
Platform as a service
Infrastructure as a service
• Extract common elements as
research automation platform
Bonus question: Sustainability
We aspire (initially) to create a
great user experience for
research data management
What would a “dropbox for
science” look like?
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
It should be trivial to Collect, Move, Sync, Share, Analyze,
Annotate, Publish, Search, Backup, & Archive BIG DATA

… but in reality it’s often very challenging
!

Staging
Store

! Ingest

Expired
Store
credentials

Registry
Permission
denied
Communit
Community
yStore
Store

!

Analysis
!
Store Quota

Network
failed. Retry.

exceeded

Archive

Mirror
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
• Collect
• Annotate
Move
• Publish
• Move
Sync
• Search
• Sync
• Share
Share
• Backup
Capabilities delivered using
• Analyze
• Archive

Software-as-Service (SaaS) model

BIG DATA
2
Data
Source

1

Globus
Online
moves/syncs
files

Data
Destination

User
initiates
transfer
request
Globus Online
notifies user

3
2

1

User A selects
file(s) to share;
selects user/group,
sets share
permissions

Globus Online tracks
shared files; no need
to move files to cloud
storage!

Data
Source

3
User B logs in to
Globus Online
and accesses
shared file
Extreme ease of use
•
•
•
•
•
•
•
•

InCommon, Oauth, OpenID, X.509, …
Credential management
Group definition and management
Transfer management and optimization
Reliability via transfer retries
Web interface, REST API, command line
One-click “Globus Connect” install
5-minute Globus Connect Multi User install
Early adoption is encouraging
Early adoption is encouraging

>12,000 registered users; >150 daily
>27 PB moved; >1B files
10x (or better) performance vs. scp
99.9% availability
Entirely hosted on Amazon
Amazon web services used
• Amazon EC2 for hosting Globus services
• Elastic Load Balancing to use multiple
Availability Zones for reliability and uptime
• Amazon S3 to store historical state
• Amazon RDS PostgreSQL for active state
K. Heitmann (Argonne)
moves 22 TB of cosmology
data LANL  ANL at 5 Gb/s
B. Winjum (UCLA) moves
900K-file plasma physics
datasets UCLA NERSC
Dan Kozak (Caltech) replicates 1
PB LIGO astronomy data for
resilience
Erin Miller (PNNL)
collects data at
Advanced Photon
Source, renders at
PNNL, and views at ANL
Credit: Kerstin Kleese-van Dam
• Collect
• Annotate
Move
• Publish
• Move
Sync
• Search
• Sync
• Share
Share
• Backup
Capabilities delivered using
• Analyze
• Archive

Software-as-Service (SaaS) model

BIG DATA
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

Globus Online already does a lot
The identity challenge in science
• Research communities often need to
– Assign identities to their users
– Manage user profiles
– Organize users into groups for authorization

• Obstacles to high-quality implementations
–
–
–
–

Complexity of associated security protocols
Creation of identity silos
Multiple credentials for users
Reliability, availability, scalability, security
Nexus provides four key capabilities
• Identity provisioning

I
I

I

– Create, manage Globus identities
I

I
G

I
V
U
aI b

• Identity hub
– Link with other identities; use
to authenticate to services

• Group hub
– User-managed groups; groups can
be used for authorization

• Profile management
– User-managed attributes;
can use in group admission

Key points:
1) Outsource
identity, group,
profile
management
2) REST API for
flexible integration
3) Intuitive,
customizable
Web interfaces
Branded sites
XSEDE

Open Science Grid

University of Chicago

DOE kBase

Indiana University

University of Exeter

Globus Online

NERSC

NIH BIRN
A platform for integration
A platform for integration
A platform for integration
Data management SaaS (Globus) +
Next-gen sequence analysis pipelines (Galaxy) +
Cloud IaaS (Amazon) =
Flexible, scalable, easy-to-use genomics analysis for
all biologists
globus
genomics
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

We are adding capabilities
Dataset Services
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

We are adding capabilities
We are adding capabilities
• Ingest and publication
– Imagine a DropBox that not only replicates, but also extracts
metadata, catalogs, converts

• Cataloging
– Virtual views of data based on user-defined and/or automatically
extracted metadata

• Computation
– Associate computational procedures, orchestrate application,
catalog results, record provenance
Next Gen Sequencing Analysis for Everyone –
No IT Required
Ravi K Madduri, The University of Chicago and Argonne National Laboratory

November 14, 2013
One slide to get your attention
Outline
• Globus Vision
• Challenges in Sequencing Analysis
– Big Data Management
– Analysis at Scale
– Reproducibility

• Proposed Approach Using Globus Genomics
• Example Collaborations
• Q&A
Globus Vision
Goal: Accelerate discovery and innovation worldwide
by providing research IT as a service
Leverage software-as-a-service to:
– provide millions of researchers with unprecedented access to
powerful tools for managing Big Data
– reduce research IT costs dramatically via economies of scale

“Civilization advances by extending the number of important
operations which we can perform without thinking of them”
—Alfred North Whitehead , 1911
Challenges in Sequencing Analysis
Data Movement and Access Challenges
•
•
•
•

Shell scripts to sequentially execute the tools
Manually modify the scripts for any change

•

Public
Data

Manually move the data to the Compute node
Install all the tools required for the Analysis

Difficult to maintain and transfer the knowledge

•

BWA, Picard, GATK, Filtering Scripts, etc.

•

Error Prone, difficult to keep track, messy..

Storage

Sequencing
Centers

Fastq

Ref Genome

Research Lab
Seq
Center

Local Cluster/
Cloud

Modify

Picard
Install

•
•
•
•

Data is distributed in different locations
Research labs need access to the data for analysis
Be able to share data with other researchers/collaborators
•
Inefficient ways of data movement
Data needs to be available on the local and distributed compute
Resources
•
Local clusters, cloud, grid and transfer the knowledge

Alignment
(Re)Run
GATK

Script
Variant
Calling

How do we analyze this
Sequence Data

Manual Data Analysis
Globus Genomics
Globus Genomics

Galaxy Based
Workflow
Management System
•

Public
Data
Sequencin
g Centers

Globus Provides a
•
High-performance
Research Lab
•
Fault-tolerant
Seq Secure
•
Center

Storage

•
•

Galaxy
Data Libraries

•
Local Cluster/
Cloud

•

file transfer Service between
all data-endpoints

Globus Integrated
within Galaxy
Web-based UI
Drag-Drop workflow
creations
Easily modify
Workflows with new
tools
Analytical tools are
automatically run on
the scalable compute
resources when
possible

Globus Genomics on
Amazon EC2

Data Management

Data Analysis
Globus Genomics Architecture

Figure 2: Globus Genomics Architecture
Globus Genomics Usage
Globus Genomics
• Computational profiles for
various analysis tools
• Resources can be provisioned
on-demand with Amazon Web
Services cloud based
infrastructure
• Glusterfs as a shared file
system between head nodes
and compute nodes
• Provisioned I/O on Amazon EBS
Coming soon!
• Integration with Globus Catalog
– Better data discovery and metadata management

• Integration with Globus Sharing
– Easy and secure method to share large datasets with collaborators

• Integration with Amazon Glacier for data archiving
• Support for high throughput computational
modalities through Apache Mesos
– MapReduce and MPI clusters

• Dynamic Storage Strategies using Amazon S3 or
LVM-based shared file system
Our vision for a 21st century
discovery infrastructure
Provide more capability for
more people at lower cost by
building a “Discovery Cloud”
Delivering “Science as a service”
Thank you to our sponsors
For more information
• More information on Globus Genomics and to
sign up: www.globus.org/genomics
• More information on Globus:
www.globusonline.org
• Follow us on Twitter: @ianfoster, @madduri,
@globusgenomics, @globusonline
Thank you!
Please give us your feedback on this
presentation

BDT 310
As a thank you, we will select prize
winners daily for completed surveys!

Weitere ähnliche Inhalte

Was ist angesagt?

Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
Amazon Web Services
 

Was ist angesagt? (20)

Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ Scale
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 

Andere mochten auch

Customer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, CambridgeCustomer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, Cambridge
Amazon Web Services
 
AWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimesAWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimes
Amazon Web Services
 

Andere mochten auch (20)

Canberra Symposium Keynote
Canberra Symposium KeynoteCanberra Symposium Keynote
Canberra Symposium Keynote
 
Customer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, CambridgeCustomer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, Cambridge
 
AWS Webcast - Build Agile Applications in AWS Cloud
AWS Webcast - Build Agile Applications in AWS CloudAWS Webcast - Build Agile Applications in AWS Cloud
AWS Webcast - Build Agile Applications in AWS Cloud
 
AWS Customer Presentation - The Guardian
AWS Customer Presentation - The GuardianAWS Customer Presentation - The Guardian
AWS Customer Presentation - The Guardian
 
AWS Customer Presentation - Justin.tv
AWS Customer Presentation - Justin.tvAWS Customer Presentation - Justin.tv
AWS Customer Presentation - Justin.tv
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin MigrationAWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
 
AWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High AvailabilityAWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High Availability
 
AWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 SydneyAWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 Sydney
 
Managing an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid ArchitectureManaging an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid Architecture
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
REA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation DayREA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation Day
 
Getting to MVP
Getting to MVPGetting to MVP
Getting to MVP
 
AWS Compute Services State of the Union (CPN202) | AWS re:Invent 2013
AWS Compute Services State of the Union (CPN202) | AWS re:Invent 2013AWS Compute Services State of the Union (CPN202) | AWS re:Invent 2013
AWS Compute Services State of the Union (CPN202) | AWS re:Invent 2013
 
AWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimesAWS Summit 2011: Customer Presentation - NYTimes
AWS Summit 2011: Customer Presentation - NYTimes
 
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
AWS Summit Bogotá Track Avanzado: Virtual Private Cloud
 
AWS Partner Webcast - Make Decisions Faster with AWS and SAP on HANA
AWS Partner Webcast - Make Decisions Faster with AWS and SAP on HANAAWS Partner Webcast - Make Decisions Faster with AWS and SAP on HANA
AWS Partner Webcast - Make Decisions Faster with AWS and SAP on HANA
 
AWS Partner Presentation - Sonian
AWS Partner Presentation - SonianAWS Partner Presentation - Sonian
AWS Partner Presentation - Sonian
 
AWS Customer Service - Sonian
AWS Customer Service - Sonian AWS Customer Service - Sonian
AWS Customer Service - Sonian
 
Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2 Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2
 

Ähnlich wie Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 

Ähnlich wie Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013 (20)

Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription Features
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with Globus
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013

  • 1. Science as a Service Ian Foster, The University of Chicago and Argonne National Laboratory November 14, 2013
  • 2. A time of disruptive change
  • 3. A time of disruptive change
  • 4. Most labs have limited resources Heidorn: NSF grants in 2007 $1,000,000 $100,000 $10,000 < $350,000 80% of awards 50% of grant $$ $1,000 2000 4000 6000 8000
  • 5. Automation is required to apply more sophisticated methods to far more data
  • 6. Automation is required to apply more sophisticated methods to far more data Outsourcing is needed to achieve economies of scale in the use of automated methods
  • 7. Building a discovery cloud • Identify time-consuming activities amenable to automation and outsourcing • Implement as high-quality, low-touch SaaS • Leverage IaaS for reliability, Software as a service economies of scale Platform as a service Infrastructure as a service • Extract common elements as research automation platform Bonus question: Sustainability
  • 8. We aspire (initially) to create a great user experience for research data management What would a “dropbox for science” look like?
  • 9. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 10. It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging ! Staging Store ! Ingest Expired Store credentials Registry Permission denied Communit Community yStore Store ! Analysis ! Store Quota Network failed. Retry. exceeded Archive Mirror
  • 11. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 12. • Collect • Annotate Move • Publish • Move Sync • Search • Sync • Share Share • Backup Capabilities delivered using • Analyze • Archive Software-as-Service (SaaS) model BIG DATA
  • 14. 2 1 User A selects file(s) to share; selects user/group, sets share permissions Globus Online tracks shared files; no need to move files to cloud storage! Data Source 3 User B logs in to Globus Online and accesses shared file
  • 15. Extreme ease of use • • • • • • • • InCommon, Oauth, OpenID, X.509, … Credential management Group definition and management Transfer management and optimization Reliability via transfer retries Web interface, REST API, command line One-click “Globus Connect” install 5-minute Globus Connect Multi User install
  • 16. Early adoption is encouraging
  • 17. Early adoption is encouraging >12,000 registered users; >150 daily >27 PB moved; >1B files 10x (or better) performance vs. scp 99.9% availability Entirely hosted on Amazon
  • 18. Amazon web services used • Amazon EC2 for hosting Globus services • Elastic Load Balancing to use multiple Availability Zones for reliability and uptime • Amazon S3 to store historical state • Amazon RDS PostgreSQL for active state
  • 19. K. Heitmann (Argonne) moves 22 TB of cosmology data LANL  ANL at 5 Gb/s
  • 20. B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
  • 21. Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
  • 22. Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL Credit: Kerstin Kleese-van Dam
  • 23. • Collect • Annotate Move • Publish • Move Sync • Search • Sync • Share Share • Backup Capabilities delivered using • Analyze • Archive Software-as-Service (SaaS) model BIG DATA
  • 24. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 25. Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs Globus Online already does a lot
  • 26. The identity challenge in science • Research communities often need to – Assign identities to their users – Manage user profiles – Organize users into groups for authorization • Obstacles to high-quality implementations – – – – Complexity of associated security protocols Creation of identity silos Multiple credentials for users Reliability, availability, scalability, security
  • 27. Nexus provides four key capabilities • Identity provisioning I I I – Create, manage Globus identities I I G I V U aI b • Identity hub – Link with other identities; use to authenticate to services • Group hub – User-managed groups; groups can be used for authorization • Profile management – User-managed attributes; can use in group admission Key points: 1) Outsource identity, group, profile management 2) REST API for flexible integration 3) Intuitive, customizable Web interfaces
  • 28. Branded sites XSEDE Open Science Grid University of Chicago DOE kBase Indiana University University of Exeter Globus Online NERSC NIH BIRN
  • 29. A platform for integration
  • 30. A platform for integration
  • 31. A platform for integration
  • 32. Data management SaaS (Globus) + Next-gen sequence analysis pipelines (Galaxy) + Cloud IaaS (Amazon) = Flexible, scalable, easy-to-use genomics analysis for all biologists globus genomics
  • 33. Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs We are adding capabilities
  • 34. Dataset Services Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs We are adding capabilities
  • 35. We are adding capabilities • Ingest and publication – Imagine a DropBox that not only replicates, but also extracts metadata, catalogs, converts • Cataloging – Virtual views of data based on user-defined and/or automatically extracted metadata • Computation – Associate computational procedures, orchestrate application, catalog results, record provenance
  • 36. Next Gen Sequencing Analysis for Everyone – No IT Required Ravi K Madduri, The University of Chicago and Argonne National Laboratory November 14, 2013
  • 37. One slide to get your attention
  • 38. Outline • Globus Vision • Challenges in Sequencing Analysis – Big Data Management – Analysis at Scale – Reproducibility • Proposed Approach Using Globus Genomics • Example Collaborations • Q&A
  • 39. Globus Vision Goal: Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to: – provide millions of researchers with unprecedented access to powerful tools for managing Big Data – reduce research IT costs dramatically via economies of scale “Civilization advances by extending the number of important operations which we can perform without thinking of them” —Alfred North Whitehead , 1911
  • 40. Challenges in Sequencing Analysis Data Movement and Access Challenges • • • • Shell scripts to sequentially execute the tools Manually modify the scripts for any change • Public Data Manually move the data to the Compute node Install all the tools required for the Analysis Difficult to maintain and transfer the knowledge • BWA, Picard, GATK, Filtering Scripts, etc. • Error Prone, difficult to keep track, messy.. Storage Sequencing Centers Fastq Ref Genome Research Lab Seq Center Local Cluster/ Cloud Modify Picard Install • • • • Data is distributed in different locations Research labs need access to the data for analysis Be able to share data with other researchers/collaborators • Inefficient ways of data movement Data needs to be available on the local and distributed compute Resources • Local clusters, cloud, grid and transfer the knowledge Alignment (Re)Run GATK Script Variant Calling How do we analyze this Sequence Data Manual Data Analysis
  • 41. Globus Genomics Globus Genomics Galaxy Based Workflow Management System • Public Data Sequencin g Centers Globus Provides a • High-performance Research Lab • Fault-tolerant Seq Secure • Center Storage • • Galaxy Data Libraries • Local Cluster/ Cloud • file transfer Service between all data-endpoints Globus Integrated within Galaxy Web-based UI Drag-Drop workflow creations Easily modify Workflows with new tools Analytical tools are automatically run on the scalable compute resources when possible Globus Genomics on Amazon EC2 Data Management Data Analysis
  • 42. Globus Genomics Architecture Figure 2: Globus Genomics Architecture
  • 44.
  • 45. Globus Genomics • Computational profiles for various analysis tools • Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure • Glusterfs as a shared file system between head nodes and compute nodes • Provisioned I/O on Amazon EBS
  • 46. Coming soon! • Integration with Globus Catalog – Better data discovery and metadata management • Integration with Globus Sharing – Easy and secure method to share large datasets with collaborators • Integration with Amazon Glacier for data archiving • Support for high throughput computational modalities through Apache Mesos – MapReduce and MPI clusters • Dynamic Storage Strategies using Amazon S3 or LVM-based shared file system
  • 47.
  • 48. Our vision for a 21st century discovery infrastructure Provide more capability for more people at lower cost by building a “Discovery Cloud” Delivering “Science as a service”
  • 49. Thank you to our sponsors
  • 50. For more information • More information on Globus Genomics and to sign up: www.globus.org/genomics • More information on Globus: www.globusonline.org • Follow us on Twitter: @ianfoster, @madduri, @globusgenomics, @globusonline
  • 52. Please give us your feedback on this presentation BDT 310 As a thank you, we will select prize winners daily for completed surveys!