SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Opportunities for X-ray science in future computing architecture Ian Foster Computation Institute University of Chicago & Argonne National Laboratory
Abstract The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
Fastest supercomputer(floating point ops/sec) 1E+17 multi-Petaflop Petaflop Blue Gene/L 1E+14 Thunder Red Storm Earth Blue Pacific ASCI White, ASCI Q SX-5 ASCI Red Option ASCI Red T3E SX-4 NWT CP-PACS 1E+11 CM-5 Paragon T3D Delta SX-3/44 Doubling time = 1.5 yr. i860 (MPPs) VP2600/10 SX-2 CRAY-2 Y-MP8 S-810/20 X-MP4 Cyber 205 Peak Speed (flops) X-MP2 (parallel vectors) 1E+8 CRAY-1 CDC STAR-100 (vectors) CDC 7600 ILLIAC IV CDC 6600 (ICs) IBM Stretch 1E+5 IBM 7090 (transistors) IBM 704 IBM 701 UNIVAC ENIAC (vacuum tubes) 1E+2 1940 1950 1960 1970 1980 1990 2000 2010 Year Introduced Argonne My laptop
Brahe 30 years ? years
Brahe 30 years ? years 10 years 6 years 2 years Kepler
Brahe 30 years ? years 10 years 6 years 2 years Kepler
Computers at Harvard, 1890
Sloan Digital Sky Survey
Aggregate SkyServer monthly traffic from 2001 to 2006. (Singh et al., 2006) Sloan Digital Sky Survey publication statistics, Chen et al., 2009.
Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
Intel x86 processor trends
Gordon Bell prize winners
Complexity Dimensions Algorithms Coupled (& non-linear) equations Timescale Optimization Error analysis Parameters or ensemble members Resolution Time Simple				       Complex 1			     2			             3 1					             Many Short		 Long	   Multiscale Few					             Many No					                 Yes Coarse		 Fine	   	      Adaptive No					               Yes Dan Katz
Rational design of catalytic materials(Curtis, Greely, Zapol, Kumaran) Create Synthesis and processing methods informed by computation; generate data Design Materials with desired properties based on computation and data Understand Relationship between materials properties and structure 15 15
Identifying optimal candidates
17 High-throughput screening on BG/P [SC08] “Towards Loosely-Coupled Programming on Petascale Systems”
Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
PC disk drive capacity
Data generation and analysis costs outpace Moore’s Law $900,000 Wilkening et al, IEEE Cluster09
Datacomplexity also increasing ID   MURA_BACSU     STANDARD;      PRT;   429 AA. DE   PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE   (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE   ENOLPYRUVYL TRANSFERASE) (EPT). GN   MURA OR MURZ. OS   BACILLUS SUBTILIS. OC   BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC   BACILLUS. KW   PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT   ACT_SITE    116    116       BINDS PEP (BY SIMILARITY). FT   CONFLICT    374    374       S -> A (IN REF. 3). SQ   SEQUENCE   429 AA;  46016 MW;  02018C5C CRC32;      MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI      GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP      RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT      IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI [source: GlaxoSmithKline]
Volume Complexity Analysis demands
Bob Grossman
“light sources alone are not enough 
 Enormous data sets of diffracted signals in reciprocal space and across wide energy ranges mustbe collected and analyzed in real time so that they can guide the ongoing experiments.”
Source: Liz Lyon
Pattern recognition in x-ray spectromicroscopy Kevin Boyce, U. Chicago: study of the evolution of tree types, including now-extinct species that dominated in the “coal age” (carboniferous). Acetate peel of fossilized wood. Shows how well we can separately map cellulose-derived material from lignin-derived material in plant cell walls, with implications for cellulosic ethanol production from biomass. Lignin-derived and cellulose-derived regions in 400 million year old chert: Boyce et al., Proc. Nat. Acad. Sci. 101, 17555 (2004), with subsequent pattern recognition analysis by Lerotic, Jacobsen, SchĂ€fer, and Vogt, Ultramicroscopy100, 35 (2004).
LDRD: “Next Generation Data Exploration - Intelligence in Data Analysis, Visualization, & Mining” “Here’s a cell in this tissue. How much zinc does it have? In the rest of the tissue, how many cells are there like this, and what is their distribution of zinc content?” Fluorescence and absorption spectral imaging Databases to combine results of multiple experiments and instruments Multivariate statistical analysis and pattern recognition People: APS: Stefan Vogt (PI), Lydia Finney, Chris Jacobsen, Chris Roerhig, Claude Saunders,  Jesse Ward; Mathematics and Computer Science, ANL: Sven Leyffer, Stefan Wild, Mark Hereld; Northwestern: Rachel Mak
“Lambdas” Wavelength Division Multiplexing
Rapid evolution of 10GbE port pricesmakes campus-Scale 10 Gbps affordable $80K/port  Chiaro (60 Max) $ 5K Force 10 (40 max) ~$1000 (300+ Max) $ 500 Arista 48 ports $ 400 Arista 48 ports 2005                                   2007                                  2009                       2010 Source:  Philip Papadopoulos, SDSC, UCSD
Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS)
Economies of scale in operations
Time-consuming tasks in business  Web presence Email (hosted Exchange) Calendar  Telephony (hosted VOIP)  Human resources and payroll  Accounting  Customer relationship mgmt  Data analytics  Content distribution  
 SaaS
Time-consuming tasks in business  Web presence Email (hosted Exchange) Calendar  Telephony (hosted VOIP)  Human resources and payroll  Accounting  Customer relationship mgmt Data analytics  Content distribution  
 SaaS IaaS
Time-consuming tasks in science Run experiments Collect data Manage data Move data Acquire computers Analyze data Run simulations Compare experiment with simulation Search the literature ,[object Object]
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies
Write proposals
Write reports

,[object Object]
      Globus Toolkit  Globus Online Build the Grid     Components for building custom grid solutions globustoolkit.org Use the Grid   Cloud-hostedfile transfer service globusonline.org
Time-consuming tasks in science Run experiments Collect data Manage data Move data Acquire computers Analyze data Run simulations Compare experiment with simulation Search the literature ,[object Object]
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies

Weitere Àhnliche Inhalte

Was ist angesagt?

The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningThe Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr
 
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...Larry Smarr
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
The Pacific Research Platform‹ Two Years In
The Pacific Research Platform‹ Two Years InThe Pacific Research Platform‹ Two Years In
The Pacific Research Platform‹ Two Years InLarry Smarr
 
Petrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data ServicePetrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data ServiceGlobus
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGLarry Smarr
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesAndré Valdestilhas
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-dddc.titus.brown
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 

Was ist angesagt? (20)

The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningThe Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
 
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
The Pacific Research Platform‹ Two Years In
The Pacific Research Platform‹ Two Years InThe Pacific Research Platform‹ Two Years In
The Pacific Research Platform‹ Two Years In
 
Petrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data ServicePetrel: A Programmatically Accessible Research Data Service
Petrel: A Programmatically Accessible Research Data Service
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSG
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 

Andere mochten auch

Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012Ian Foster
 
Developing Technology-Enhanced Learning at DMU
Developing Technology-Enhanced Learning at DMUDeveloping Technology-Enhanced Learning at DMU
Developing Technology-Enhanced Learning at DMURichard Hall
 
Enterprise 2.0 Use Cases for Semantic Web/Kiwi
   Enterprise 2.0 Use Cases for Semantic Web/Kiwi    Enterprise 2.0 Use Cases for Semantic Web/Kiwi
Enterprise 2.0 Use Cases for Semantic Web/Kiwi Peter H. Reiser
 
More Captivating Eyes
More Captivating EyesMore Captivating Eyes
More Captivating EyesGretacalinda
 
Taking forward change in technology-enhanced education
Taking forward change in technology-enhanced educationTaking forward change in technology-enhanced education
Taking forward change in technology-enhanced educationRichard Hall
 
Employers Want to Hire You - Belive in this when you go to the Interview
Employers Want to Hire You - Belive in this when you go to the InterviewEmployers Want to Hire You - Belive in this when you go to the Interview
Employers Want to Hire You - Belive in this when you go to the InterviewEmployment Crossing
 

Andere mochten auch (8)

Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
 
Developing Technology-Enhanced Learning at DMU
Developing Technology-Enhanced Learning at DMUDeveloping Technology-Enhanced Learning at DMU
Developing Technology-Enhanced Learning at DMU
 
Enterprise 2.0 Use Cases for Semantic Web/Kiwi
   Enterprise 2.0 Use Cases for Semantic Web/Kiwi    Enterprise 2.0 Use Cases for Semantic Web/Kiwi
Enterprise 2.0 Use Cases for Semantic Web/Kiwi
 
More Captivating Eyes
More Captivating EyesMore Captivating Eyes
More Captivating Eyes
 
Tango Passion
Tango PassionTango Passion
Tango Passion
 
Move Towards the Light
Move Towards the LightMove Towards the Light
Move Towards the Light
 
Taking forward change in technology-enhanced education
Taking forward change in technology-enhanced educationTaking forward change in technology-enhanced education
Taking forward change in technology-enhanced education
 
Employers Want to Hire You - Belive in this when you go to the Interview
Employers Want to Hire You - Belive in this when you go to the InterviewEmployers Want to Hire You - Belive in this when you go to the Interview
Employers Want to Hire You - Belive in this when you go to the Interview
 

Ähnlich wie Opportunities for X-Ray science in future computing architectures

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayLarry Smarr
 
The OptIPuter and Its Applications
The OptIPuter and Its ApplicationsThe OptIPuter and Its Applications
The OptIPuter and Its ApplicationsLarry Smarr
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Larry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Physics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global CyberinfrastructurePhysics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global CyberinfrastructureLarry Smarr
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 

Ähnlich wie Opportunities for X-Ray science in future computing architectures (20)

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
The OptIPuter and Its Applications
The OptIPuter and Its ApplicationsThe OptIPuter and Its Applications
The OptIPuter and Its Applications
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Physics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global CyberinfrastructurePhysics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global Cyberinfrastructure
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 

Mehr von Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 

Mehr von Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 

KĂŒrzlich hochgeladen

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

KĂŒrzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Opportunities for X-Ray science in future computing architectures

  • 1. Opportunities for X-ray science in future computing architecture Ian Foster Computation Institute University of Chicago & Argonne National Laboratory
  • 2. Abstract The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
  • 3. Fastest supercomputer(floating point ops/sec) 1E+17 multi-Petaflop Petaflop Blue Gene/L 1E+14 Thunder Red Storm Earth Blue Pacific ASCI White, ASCI Q SX-5 ASCI Red Option ASCI Red T3E SX-4 NWT CP-PACS 1E+11 CM-5 Paragon T3D Delta SX-3/44 Doubling time = 1.5 yr. i860 (MPPs) VP2600/10 SX-2 CRAY-2 Y-MP8 S-810/20 X-MP4 Cyber 205 Peak Speed (flops) X-MP2 (parallel vectors) 1E+8 CRAY-1 CDC STAR-100 (vectors) CDC 7600 ILLIAC IV CDC 6600 (ICs) IBM Stretch 1E+5 IBM 7090 (transistors) IBM 704 IBM 701 UNIVAC ENIAC (vacuum tubes) 1E+2 1940 1950 1960 1970 1980 1990 2000 2010 Year Introduced Argonne My laptop
  • 4. Brahe 30 years ? years
  • 5. Brahe 30 years ? years 10 years 6 years 2 years Kepler
  • 6. Brahe 30 years ? years 10 years 6 years 2 years Kepler
  • 9.
  • 10. Aggregate SkyServer monthly traffic from 2001 to 2006. (Singh et al., 2006) Sloan Digital Sky Survey publication statistics, Chen et al., 2009.
  • 11. Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
  • 13. Gordon Bell prize winners
  • 14. Complexity Dimensions Algorithms Coupled (& non-linear) equations Timescale Optimization Error analysis Parameters or ensemble members Resolution Time Simple Complex 1 2 3 1 Many Short Long Multiscale Few Many No Yes Coarse Fine Adaptive No Yes Dan Katz
  • 15. Rational design of catalytic materials(Curtis, Greely, Zapol, Kumaran) Create Synthesis and processing methods informed by computation; generate data Design Materials with desired properties based on computation and data Understand Relationship between materials properties and structure 15 15
  • 17. 17 High-throughput screening on BG/P [SC08] “Towards Loosely-Coupled Programming on Petascale Systems”
  • 18. Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
  • 19. PC disk drive capacity
  • 20. Data generation and analysis costs outpace Moore’s Law $900,000 Wilkening et al, IEEE Cluster09
  • 21. Datacomplexity also increasing ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI [source: GlaxoSmithKline]
  • 23.
  • 24.
  • 26. “light sources alone are not enough 
 Enormous data sets of diffracted signals in reciprocal space and across wide energy ranges mustbe collected and analyzed in real time so that they can guide the ongoing experiments.”
  • 28. Pattern recognition in x-ray spectromicroscopy Kevin Boyce, U. Chicago: study of the evolution of tree types, including now-extinct species that dominated in the “coal age” (carboniferous). Acetate peel of fossilized wood. Shows how well we can separately map cellulose-derived material from lignin-derived material in plant cell walls, with implications for cellulosic ethanol production from biomass. Lignin-derived and cellulose-derived regions in 400 million year old chert: Boyce et al., Proc. Nat. Acad. Sci. 101, 17555 (2004), with subsequent pattern recognition analysis by Lerotic, Jacobsen, SchĂ€fer, and Vogt, Ultramicroscopy100, 35 (2004).
  • 29. LDRD: “Next Generation Data Exploration - Intelligence in Data Analysis, Visualization, & Mining” “Here’s a cell in this tissue. How much zinc does it have? In the rest of the tissue, how many cells are there like this, and what is their distribution of zinc content?” Fluorescence and absorption spectral imaging Databases to combine results of multiple experiments and instruments Multivariate statistical analysis and pattern recognition People: APS: Stefan Vogt (PI), Lydia Finney, Chris Jacobsen, Chris Roerhig, Claude Saunders, Jesse Ward; Mathematics and Computer Science, ANL: Sven Leyffer, Stefan Wild, Mark Hereld; Northwestern: Rachel Mak
  • 31. Rapid evolution of 10GbE port pricesmakes campus-Scale 10 Gbps affordable $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) ~$1000 (300+ Max) $ 500 Arista 48 ports $ 400 Arista 48 ports 2005 2007 2009 2010 Source: Philip Papadopoulos, SDSC, UCSD
  • 32.
  • 33.
  • 34. Three discontinuities: 1) Massive parallelism 2) Large data 3) Economics of aggregation
  • 35. Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS)
  • 36. Economies of scale in operations
  • 37. Time-consuming tasks in business Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution 
 SaaS
  • 38. Time-consuming tasks in business Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution 
 SaaS IaaS
  • 39.
  • 41. Find, configure, install relevant software
  • 42. Find, access, analyze relevant data
  • 46.
  • 47. Globus Toolkit Globus Online Build the Grid Components for building custom grid solutions globustoolkit.org Use the Grid Cloud-hostedfile transfer service globusonline.org
  • 48.
  • 50. Find, configure, install relevant software
  • 51. Find, access, analyze relevant data
  • 55.
  • 57. Find, configure, install relevant software
  • 58. Find, access, analyze relevant data
  • 62.
  • 63.
  • 64.
  • 65. Task ID : bc6d776c-2af4-11e0-9a1d-12313916526c Task Type : TRANSFER ParentTask ID : n/a Status : SUCCEEDED Request Time : 2011-01-28 15:39:04Z Deadline : 2011-01-29 15:39:04Z Completion Time : 2011-01-28 16:17:12Z Total Tasks : 500 TasksSuccessful : 500 TasksExpired : 0 TasksCanceled : 0 TasksFailed : 0 TasksPending : 0 TasksRetrying : 0 Command : transfer (+500 input lines) Files : 500 Directories : 0 Bytes Transferred: 1073741824000 MBits/sec : 3754.342 ALCF-NERSC task summary
  • 66. 11 x 125 files 200 MB each 11 users 12 sites
  • 67. Keith Cheng’s phenome project GordonKindlmann 3000 zebra fish mutants
  • 68. Penn State University Phenome Project Coordination Argonne / U Chicago Grid Supercomputing Facility Argonne National Lab AdvancedPhoton Source Graphics Workstations Tomographic Reconstruction, Deringing, Segmentation, Morphometrics & Visualization DAS APS Beamline Data Acquisition Pattern Recognition Segmentation & Visualization Software Develop. NAS GridFTP Server GridFTP Server SAN GridFTP Server HPC Cluster 1 Gbps Network link 10 Gbps Network link Regular Internet link Beamline data flow Globus Online - hosted service for high-speed, reliable, secure data movement Users
  • 69. Penn State University Phenome Project Coordination Argonne / U Chicago Grid Supercomputing Facility Argonne National Lab AdvancedPhoton Source Graphics Workstations Tomographic Reconstruction, Deringing, Segmentation, Morphometrics & Visualization DAS APS Beamline Data Acquisition Pattern Recognition Segmentation & Visualization Software Develop. NAS GridFTP Server GridFTP Server SAN GridFTP Server HPC Cluster 1 Gbps Network link 10 Gbps Network link Regular Internet link Beamline data flow Globus Online - hosted service for high-speed, reliable, secure data movement Users
  • 70. Four theses Ultrascale computing enables new problem-solving methods Research data management is an essential service like electricity and networking Economies of scale motivate highly aggregated computing and storage Automation of science processes accelerates discovery and yields competitive advantage

Hinweis der Redaktion

  1. Trends: computers, storage, detectors, 
It’s the ratios that matter: Cores/CPU, CPUs/computer, data/scientistExperiment and simulation
  2. To show what I means, let’s look at the example of astronomy again.Tycho Brahe 
 30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newton’s law of gravitation.
  3. To show what this means, let’s look at the example of astronomy once again.Tycho Brahe 
 30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newton’s law of gravitation.
  4. Some allege that Kepler took unusual steps to acquire his data. Hopefully not so common.
  5. Photographic plates  We need computers!Here are some early computers in Harvard Observatory, around 1890.Computing the consequences of equations became a profession1 multiplication per 2 seconds, maybe, x 8 people, 4 multiplications per secondHowever, unreliable and hard to get to work more than 8 hours per day
  6. By the late 1990s, in 5 years, imaged 230 million celestial objects, measuring the spectra of more than 1 million of them
  7. “Slices through the SDSS 3-dimensional map of the distribution of galaxies. Earth is at the center, and each point represents a galaxy, typically containing about 100 billion stars. Galaxies are colored according to the ages of their stars, with the redder, more strongly clustered points showing galaxies that are made of older stars. The outer circle is at a distance of two billion light years. The region between the wedges was not mapped by the SDSS because dust in our own Galaxy obscures the view of the distant universe in these directions. Both slices contain all galaxies within -1.25 and 1.25 degrees declination.”
  8. http://xrds.acm.org/article.cfm?aid=1836552
  9. Sequencing volumes doubling every 4-6 months.Note the log scale!Bioinformatics cost is purely BLAST;values are in Amazon EC2Lessons: 1) Need computer scientists; 2) Need more hardware; 3) Need more collaboration on analysis.
  10. In contrast, see SDSS—and also Google.VolumeDiversity and complexitySpeed of analysis
  11. Research data management in 2011
  12. Photon science recognizes the importance of computing.However, if we perform some simple textual analysis, we see that ~1% of the report talks about computing and data. 670 out of 50,676 words—1.3%
  13. Liz Lyon, U. Bath—Associate Director, UK Digital Curation CenterGeneric Data Acquisition (GDA) software developed at Daresbury initially, now at Diamond Light Source.
  14. Chris Jacobsen
  15. What about networking?Difficult to price, but many experts estimate a doubling time of 9 months for network capacity thanks to WDM and optical doping.10 Gbps per User ~ 100-1000x Shared Internet Throughput
  16. Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects
  17. Chicago is an international networking hub
  18. Chicago railroads, 1950 (http://www.encyclopedia.chicagohistory.org/pages/1774.html)
  19. Motivated by enormous parallelism,massive data, complexityEnabled by networks
  20. What’s this got to do with that cloud thing?Recall that “cloud” is a term used to mean a few different things
  21. Next question: Where does computing happen? Massive parallelism in computing and storage. Operations costs go up.Google data center in OregonNote also variation in cost of power: factor of 5
  22. Interestingly, if we look at the situation in business, things are quite different.There is a similarly long list of time-consuming tasks. There is a large and growing SaaS industry that addresses many of them.If I start a business today, I can do it from a coffee shop—there is no need to acquire and run any IT at all. I can outsource 

  23. Of course, people also make effective use of IaaS, but only for more specialized tasks
  24. So let’s look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks 

  25. The result of this work is something called Globus Online. This is something new. Not just more of the same Globus Toolkit stuff.Globus Toolkit: hasn’t changed. Been around 15 years. Still a toolkit for building custom Grids such as LHC, TeraGrid, ESG, BIRN, LIGO, etc.Globus Online: Focused on out sourcing the time-consuming activities associated with data transfer. Register, transfer, monitor, and customize endpoints.Globus Online is a full Web 2.0-based solution. That means a few different things. First, it is architected using REST principles: important elements are exposed as resources, on which operations can be performed using HTTP operations. These operations can be used directly, or via powerful AJAX Web GUIs.
  26. The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But we’re talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
  27. The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But we’re talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
  28. Under the covers: built as a scale-out web applicationHosted on Amazon Web ServicesReplicate state data over multiple storage servers.Dynamically scale number of VMs.
  29. Explain attempts; a cornerstone of our failure mitigation strategyThrough repeated attempts GO was able to overcome transient errors at OLCF and rangerThe expired host certs on bigred were not updated until after the run had completed
  30. 3000 zebra fish mutants
  31. Collect, move, store, index,analyze, share, update, iterate; millions of files;1000s of experiments