SlideShare ist ein Scribd-Unternehmen logo
1 von 65
computationinstitute.org
Big process for big data
Ian Foster
foster@anl.gov
computationinstitute.org
Thanks to great colleagues
and collaborators
• Steve Tuecke, Rachana Ananthakrishnan, Kyle
Chard, Raj Kettimuthu, Ravi Madduri, Tanu
Malik, and many others at Argonne & Uchicago
• Carl Kesselman, Karl Czajkowski, Rob Schuler,
and others at USC/ISI
• Francesco de Carlo, Chris Jacobsen, and others
at Argonne
• Kerstin Kleese-Van Dam, Carina Lansing, and
others at PNNL
computationinstitute.org
The Computation Institute
= UChicago + Argonne
= Cross-disciplinary nexus
= Home of the Discovery Cloud
computationinstitute.org
computationinstitute.org
x10 in 6 years
x105 in 6 years
Will data kill genomics?
Kahn, Science, 331 (6018): 728-729
computationinstitute.org
18 orders
of magnitude
in 5 decades!
12 orders
of magnitude
In 6 decades!
Moore’s Law for X-Ray Sources
computationinstitute.org
Large Hadron Collider
Higgs discovery “only possible because
of the extraordinary achievements of …
grid computing”—Rolf Heuer, CERN DG
computationinstitute.org
computationinstitute.org
1.2 PB of climate data
Delivered to 23,000 users
computationinstitute.org
We have exceptional
infrastructure for the 1%
computationinstitute.org
What about the 99%?
computationinstitute.org
Big science. Small labs.
computationinstitute.org
Need: A new way to deliver
research cyberinfrastructure
Frictionless
Affordable
Sustainable
computationinstitute.org
We asked ourselves:
What if the research work flow
could be managed as easily as…
…our pictures
…home entertainment
…our e-mail
computationinstitute.org
What makes these services great?
Great User Experience
+
High performance
(but invisible) infrastructure
computationinstitute.org
We aspire (initially) to create a
great user experience for
research data management
What would a “dropbox for
science” look like?
computationinstitute.org
• Collect
• Move
• Sync
• Share
• Analyze
• Annotate
• Publish
• Search
• Backup
• Archive
BIG DATA
computationinstitute.org
Registry
Staging
Store
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Registry
Quota
exceeded
!
Expired
credentials
!
Network
failed. Retry.
!
Permission
denied
!
It should be trivial to Collect, Move, Sync, Share, Analyze,
Annotate, Publish, Search, Backup, & Archive BIG DATA
… but in reality it’s often very challenging
computationinstitute.org
Automation is required
to apply more
sophisticated methods to
far more data
Automation and outsourcing are key
computationinstitute.org
Automation is required
to apply more
sophisticated methods to
far more data
Outsourcing is needed
to achieve economies of
scale in the use of
automated methods
Automation and outsourcing are key
computationinstitute.org
Building a discovery cloud:
Research strategy
• Identify time-consuming activity that appears
amenable to automation and outsourcing
• Implement activity as a high-quality, low-touch
SaaS solution, leveraging commercial IaaS for
high reliability, economies of scale
• Evaluate
• Extract common elements as a
research automation platform
• Repeat
Bonus question: Identify methods for
delivering SaaS solutions sustainably
Software as a service
Platform as a service
Infrastructure as a service
computationinstitute.org
• Collect
• Move
• Sync
• Share
• Analyze
• Annotate
• Publish
• Search
• Backup
• Archive
BIG DATA
computationinstitute.org
• Collect
• Move
• Sync
• Share
• Analyze
• Annotate
• Publish
• Search
• Backup
• Archive
• Collect
• Move
• Sync
• Share
Capabilities delivered using
Software-as-Service (SaaS) model
computationinstitute.org
Data
Source
Data
Destination
User
initiates
transfer
request
1
Globus
Online
moves/sy
ncs files
2
Globus Online
notifies user
3
computationinstitute.org
Data
Source
User A selects
file(s) to share;
selects
user/group, sets
share permissions
1
Globus Online tracks
shared files; no need
to move files to
cloud storage!
2
User B logs in to
Globus Online
and accesses
shared file
3
computationinstitute.org
Extreme ease of use
• InCommon, Oauth, OpenID, X.509, …
• Credential management
• Group definition and management
• Transfer management and optimization
• Reliability via transfer retries
• Web interface, REST API, command line
• One-click “Globus Connect” install
• 5-minute Globus Connect Multi User install
computationinstitute.org
Early adoption is encouraging
computationinstitute.org
Early adoption is encouraging
8,000 registered users; >100 daily
~16 PB moved; ~1B files
10x (or better) performance vs. scp
99.9% availability
Entirely hosted on Amazon
1e-011e+011e+031e+051e+07
duration
2011 2012
1 second
1 minute
1 hour
1 day
1 week
computationinstitute.org
We benefit greatly from
ESnet’s “Science DMZ”
Three key components, all required:
• “Friction free” network path
– Highly capable network devices (wire-speed, deep queues)
– Virtual circuit connectivity option
– Security policy and enforcement specific to science workflows
– Located at or near site perimeter if possible
• Dedicated, high-performance Data Transfer Nodes (DTNs)
– Hardware, operating system, libraries optimized for transfer
– Optimized data transfer tools: Globus Online, GridFTP
• Performance measurement/test node
– perfSONAR
Details at http://fasterdata.es.net/science-dmz/
computationinstitute.org
K. Heitmann (Argonne)
moves 22 TB of cosmology
data LANL  ANL at 5 Gb/s
computationinstitute.org
B. Winjum (UCLA) moves
900K-file plasma physics
datasets UCLA NERSC
computationinstitute.org
Dan Kozak (Caltech)
replicates 1 PB LIGO
astronomy data for resilience
computationinstitute.org
3Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL)
collects data at
Advanced Photon
Source, renders at
PNNL, and views at
ANL
computationinstitute.org
• Collect
• Move
• Sync
• Share
• Analyze
• Annotate
• Publish
• Search
• Backup
• Archive
BIG DATA
computationinstitute.org
• Collect
• Move
• Sync
• Share
• Analyze
• Annotate
• Publish
• Search
• Backup
• Archive
BIG DATA
computationinstitute.org
Globus Online already does a lot
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
Data management SaaS (Globus) +
Next-gen sequence analysis pipelines (Galaxy) +
Cloud IaaS (Amazon) =
Flexible, scalable, easy-to-use genomics
analysis for all biologists
globus
genomics
computationinstitute.org
A platform for integration
computationinstitute.org
A platform for integration
computationinstitute.org
A platform for integration
computationinstitute.org
We are also adding capabilities
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
More capabilities underway …
Globus Toolkit
Sharing Service
Transfer Service
Dataset Services
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
Expanding Globus Online services
• Ingest and publication
– Imagine a DropBox that not only replicates, but
also extracts metadata, catalogs, converts
• Cataloging
– Virtual views of data based on user-defined
and/or automatically extracted metadata
• Computation
– Associate computational procedures,
orchestrate application, catalog results, record
provenance
computationinstitute.org
Looking deeply at how
researchers use data
• A single research question often requires the
integration of many data elements, that are:
– In different locations
– In different formats (Excel, text, CDF, HDF, …)
– Described in different ways
• Best grouping can vary during investigation
– Longitudinal, vertical, cross-cutting
• But always needs to be operated on as a unit
– Share, annotate, process, copy, archive, …
computationinstitute.org
How do we manage data today?
• Often, a curious mix of ad hoc methods
– Organize in directories using file and directory
naming conventions
– Capture status in README files, spreadsheets,
notebooks
• Time-consuming, complex, error prone
Why can’t we manage our data like
we manage our pictures and music?
computationinstitute.org
Introducing the dataset
• Group data based on use, not location
– Logical grouping to organize, reorganize, search, and
describe usage
• Tag with characteristics that reflect content …
– Capture as much existing information as we can
• …or to reflect current status in investigation
– Stage of processing, provenance, validation, ..
• Share data sets for collaboration
– Control access to data and metadata
• Operate on datasets as units
– Copy, export, analyze, tag, archive, …
computationinstitute.org
Builds on catalog as a service
Approach
• Hosted user-defined
catalogs
• Based on tag model
<subject, name, value>
• Optional schema
constraints
• Integrated with other
Globus services
Three REST APIs
/query/
• Retrieve subjects
/tags/
• Create, delete, retrieve
tags
/tagdef/
• Create, delete, retrieve
tag definitions
Builds on USC Tagfiler project (C. Kesselman et al.)
computationinstitute.org
50
Multi-scale
imaging at
APS
Storage
Image processing
(noise removal, etc.)
Tomographic
reconstruction
Visual inspection
Selection
Beamline 2-BM-B
~1.5um resolution
Beamline 32-ID-C
20-50 nm resolution
Image processing
(noise removal, etc.)
Tomographic
reconstruction
Visual inspection
Selection
Selection
Multi-scale
image fusion
Visual inspection
Up to 100 fps
2K x 2K, 16 bits
11 GB raw data
1,500 fps
2K x 2K, 16 bits
1 min readout
11 GB raw data
51
mydata42
owner: Francesco
type: 3dtomo
format: HDF5
beamline: 2BM
Define dataset
Infer type
Extract metadata
Populate catalog(s)
Locate datasets
Access files
analyze
Catalog derived
products
transfer/schedule
Orchestration
Organization
Record
provenance
Annotate, share
browse, search
computationinstitute.org
computationinstitute.org
computationinstitute.org
computationinstitute.org
Building a discovery cloud:
Research strategy
• Identify time-consuming activity that appears
amenable to automation and outsourcing
• Implement activity as a high-quality, low-touch
SaaS solution, leveraging commercial IaaS for
high reliability, economies of scale
• Evaluate
• Extract common elements as a
research automation platform
• Repeat
Bonus question: Identify methods for
delivering SaaS solutions sustainably
Software as a service
Platform as a service
Infrastructure as a service
computationinstitute.org
Our challenge:
Sustainability
We are a non-profit service
provider to the non-profit
research community
computationinstitute.org
Globus Online Provider Plans
Support ongoing operations
Offer value-added capabilities
Engage more closely with users
computationinstitute.org
Starting at $20k per year
• Provider endpoints with sharing
• Multiple GridFTP servers per endpoint
• Branded web sites
• Alternate identity provider
• Usage reporting
• MSS optimizations
• Operations monitoring and management
• Input into and access to product roadmap
Provider Plans offer…
computationinstitute.org
To provide more capability for
more people at substantially
lower cost by creatively
aggregating (“cloud”) and
federating (“grid”) resources
“Science as a service”
Our vision for a 21st century
discovery infrastructure
computationinstitute.org
It’s a time of great opportunity … to
develop and apply Science aaS
Globus Nexus
(Identity, Group, Profile)
…
Sharing Service
Transfer Service
Dataset Services
Globus Toolkit
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
Thanks to great colleagues
and collaborators
• Steve Tuecke, Rachana Ananthakrishnan, Kyle
Chard, Raj Kettimuthu, Ravi Madduri, Tanu
Malik, and many others at Argonne & Uchicago
• Carl Kesselman, Karl Czajkowski, Rob Schuler,
and others at USC/ISI
• Francesco de Carlo, Chris Jacobsen, and others
at Argonne
• Kerstin Kleese-Van Dam, Carina Lansing, and
others at PNNL
computationinstitute.org
Thank you to our sponsors!

Weitere ähnliche Inhalte

Was ist angesagt?

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeLiana Ye
 
RuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule EventsRuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule EventsRuleML
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VODataJose Enrique Ruiz
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Globus
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitwarebigdataviz_bay
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Kerstin Lehnert
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesIlkay Altintas, Ph.D.
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 

Was ist angesagt? (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No Code
 
RuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule EventsRuleML 2015: When Processes Rule Events
RuleML 2015: When Processes Rule Events
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 

Andere mochten auch

Services for Science
Services for ScienceServices for Science
Services for ScienceIan Foster
 
Sociology Of The Grid May 2009
Sociology Of The Grid May 2009Sociology Of The Grid May 2009
Sociology Of The Grid May 2009Ian Foster
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009Ian Foster
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
GlobusWorld 2012 Foster Keynote
GlobusWorld 2012 Foster KeynoteGlobusWorld 2012 Foster Keynote
GlobusWorld 2012 Foster KeynoteIan Foster
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big DataIan Foster
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusIan Foster
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
building global software/earthcube->sciencecloud
building global software/earthcube->sciencecloudbuilding global software/earthcube->sciencecloud
building global software/earthcube->sciencecloudIan Foster
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
E science foster december 2010
E science foster december 2010E science foster december 2010
E science foster december 2010Ian Foster
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Services for Science v2 (APAN26)
Services for Science v2 (APAN26)Services for Science v2 (APAN26)
Services for Science v2 (APAN26)Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceIan Foster
 

Andere mochten auch (18)

Services for Science
Services for ScienceServices for Science
Services for Science
 
Sociology Of The Grid May 2009
Sociology Of The Grid May 2009Sociology Of The Grid May 2009
Sociology Of The Grid May 2009
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
GlobusWorld 2012 Foster Keynote
GlobusWorld 2012 Foster KeynoteGlobusWorld 2012 Foster Keynote
GlobusWorld 2012 Foster Keynote
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with Globus
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
building global software/earthcube->sciencecloud
building global software/earthcube->sciencecloudbuilding global software/earthcube->sciencecloud
building global software/earthcube->sciencecloud
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
E science foster december 2010
E science foster december 2010E science foster december 2010
E science foster december 2010
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Services for Science v2 (APAN26)
Services for Science v2 (APAN26)Services for Science v2 (APAN26)
Services for Science v2 (APAN26)
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 

Ähnlich wie Big Process for Big Data @ PNNL, May 2013

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudJamie Kinney
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Research Data Management as a Service
Research Data Management as a ServiceResearch Data Management as a Service
Research Data Management as a ServiceGlobus
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobus
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 

Ähnlich wie Big Process for Big Data @ PNNL, May 2013 (20)

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
2013 06-21-computing-for-light-sources
2013 06-21-computing-for-light-sources2013 06-21-computing-for-light-sources
2013 06-21-computing-for-light-sources
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Research Data Management as a Service
Research Data Management as a ServiceResearch Data Management as a Service
Research Data Management as a Service
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
Big Data
Big Data Big Data
Big Data
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 

Mehr von Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 

Mehr von Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 

Kürzlich hochgeladen

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Big Process for Big Data @ PNNL, May 2013

Hinweis der Redaktion

  1. The Computation Institute (or CI)A joint initiative between Uchicago and Argonne National LabA place where researchers from multiple disciplines come together and engage in research that is fundamentally enabled by computationMore recently ….we’ve been talking about it as the home of the research cloud …and I’ll describe what we mean by that throughout this talk
  2. Here are some of the areas where we have active projectsFocus on areas of particular interest to I2/Esnet, namely HEP, climate change, genomics (up and coming)
  3. And the reason is pretty obvious…This chart and others like it are becoming a cliché in next gen sequencing and big data presentations …but the point is that while Moore’s law translates to roughly 10x increase in processor power…data volumes are growing many orders of magnitude fasterAND MEANWHILE, other necessary resources [money, people] are staying pretty flatSo we have a crisis …and we hear that magic bullet of “the cloud” is going to solve itWell, as far as cost goes, clouds are helping but many issues remain
  4. 173 TB/day
  5. Another example if the earth systems grid that provides data and tools to over 20,000 climate scientists around the worldSo what’s notable about these examples?It’s the combination of the amount of data being managed and the number of people that need access to that dataWe heard Martin Leach tell us that the Broad Institute hit 10PB of spinning disk last year …and that it’s not a big dealTo a select few, these numbers are routine ….And for the projects I just talked about, the IT infrastructure is in placeThey have robust production solutionsBuilt by substantial teams at great expenseSustained, multi-year effortsApplication-specific solutions, built mostlyon common/homogeneoustechnology platforms
  6. The point is, the 1% of projects are in good shape
  7. But what about the 99% set?There are hundreds of thousands of small and medium labs around the world that are faced with similar data management challengesThey don’t have the resources to deal with these challengesSo their research suffers …and over time many may become irrelevantSo at the CI we asked ourselves a question …many questions actually about how we can help avert this crisisAnd one question that kinds sums up a lot of our thinking is…
  8. There are hundreds of thousands of small and medium labs around the world that are faced with similar data management challengesThey don’t have the resources to deal with these challengesSo their research suffers …and over time many may become irrelevantSo at the CI we asked ourselves a question …many questions actually about how we can help avert this crisisAnd one question that kinds sums up a lot of our thinking is…
  9. Can’t just expect to throw more people and $$$ at the problem ….already seeing the limits
  10. Many in this room are probably users of Dropbox or similar services for keeping their files synced across multiple machinesWell, the scientific research equivalent is a little different
  11. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  12. So how would such a drop box for science be used? Let’s look at a very typical scientific data work flow . . .Data is generated by some instrument (a sequencer at JGI or a light source like APS/ALS)…since these instruments are in high demand, users have to get their data off the instrument to make way for the next userSo the data is typically moved from a staging area to some type of ingest storeEtcetera for analysis, sharing of results with collaborators, annotation with metadata for future search, backup/sync/archival, …
  13. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  14. Started with seemingly simple/mundane task of transferring files …etc.
  15. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  16. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  17. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  18. This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF).  The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.   
  19. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  20. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  21. http://datasets.globus.org/carl-catalog/query/propertyA=value1
  22. http://www.blyberg.net/card-generator/http://www.sciencemag.org/content/332/6025/88/F1.large.jpg