The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
Opportunities for X-Ray science in future computing architectures
1. Opportunities for X-ray science in future computing architecture Ian Foster Computation Institute University of Chicago & Argonne National Laboratory
2. Abstract The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
3. Fastest supercomputer(floating point ops/sec) 1E+17 multi-Petaflop Petaflop Blue Gene/L 1E+14 Thunder Red Storm Earth Blue Pacific ASCI White, ASCI Q SX-5 ASCI Red Option ASCI Red T3E SX-4 NWT CP-PACS 1E+11 CM-5 Paragon T3D Delta SX-3/44 Doubling time = 1.5 yr. i860 (MPPs) VP2600/10 SX-2 CRAY-2 Y-MP8 S-810/20 X-MP4 Cyber 205 Peak Speed (flops) X-MP2 (parallel vectors) 1E+8 CRAY-1 CDC STAR-100 (vectors) CDC 7600 ILLIAC IV CDC 6600 (ICs) IBM Stretch 1E+5 IBM 7090 (transistors) IBM 704 IBM 701 UNIVAC ENIAC (vacuum tubes) 1E+2 1940 1950 1960 1970 1980 1990 2000 2010 Year Introduced Argonne My laptop
14. Complexity Dimensions Algorithms Coupled (& non-linear) equations Timescale Optimization Error analysis Parameters or ensemble members Resolution Time Simple Complex 1 2 3 1 Many Short Long Multiscale Few Many No Yes Coarse Fine Adaptive No Yes Dan Katz
15. Rational design of catalytic materials(Curtis, Greely, Zapol, Kumaran) Create Synthesis and processing methods informed by computation; generate data Design Materials with desired properties based on computation and data Understand Relationship between materials properties and structure 15 15
26. âlight sources alone are not enough ⊠Enormous data sets of diffracted signals in reciprocal space and across wide energy ranges mustbe collected and analyzed in real time so that they can guide the ongoing experiments.â
28. Pattern recognition in x-ray spectromicroscopy Kevin Boyce, U. Chicago: study of the evolution of tree types, including now-extinct species that dominated in the âcoal ageâ (carboniferous). Acetate peel of fossilized wood. Shows how well we can separately map cellulose-derived material from lignin-derived material in plant cell walls, with implications for cellulosic ethanol production from biomass. Lignin-derived and cellulose-derived regions in 400 million year old chert: Boyce et al., Proc. Nat. Acad. Sci. 101, 17555 (2004), with subsequent pattern recognition analysis by Lerotic, Jacobsen, SchĂ€fer, and Vogt, Ultramicroscopy100, 35 (2004).
29. LDRD: âNext Generation Data Exploration - Intelligence in Data Analysis, Visualization, & Miningâ âHereâs a cell in this tissue. How much zinc does it have? In the rest of the tissue, how many cells are there like this, and what is their distribution of zinc content?â Fluorescence and absorption spectral imaging Databases to combine results of multiple experiments and instruments Multivariate statistical analysis and pattern recognition People: APS: Stefan Vogt (PI), Lydia Finney, Chris Jacobsen, Chris Roerhig, Claude Saunders, Jesse Ward; Mathematics and Computer Science, ANL: Sven Leyffer, Stefan Wild, Mark Hereld; Northwestern: Rachel Mak
37. Time-consuming tasks in business Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution ⊠SaaS
38. Time-consuming tasks in business Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution ⊠SaaS IaaS
47. Globus Toolkit Globus Online Build the Grid Components for building custom grid solutions globustoolkit.org Use the Grid Cloud-hostedfile transfer service globusonline.org
68. Penn State University Phenome Project Coordination Argonne / U Chicago Grid Supercomputing Facility Argonne National Lab AdvancedPhoton Source Graphics Workstations Tomographic Reconstruction, Deringing, Segmentation, Morphometrics & Visualization DAS APS Beamline Data Acquisition Pattern Recognition Segmentation & Visualization Software Develop. NAS GridFTP Server GridFTP Server SAN GridFTP Server HPC Cluster 1 Gbps Network link 10 Gbps Network link Regular Internet link Beamline data flow Globus Online - hosted service for high-speed, reliable, secure data movement Users
69. Penn State University Phenome Project Coordination Argonne / U Chicago Grid Supercomputing Facility Argonne National Lab AdvancedPhoton Source Graphics Workstations Tomographic Reconstruction, Deringing, Segmentation, Morphometrics & Visualization DAS APS Beamline Data Acquisition Pattern Recognition Segmentation & Visualization Software Develop. NAS GridFTP Server GridFTP Server SAN GridFTP Server HPC Cluster 1 Gbps Network link 10 Gbps Network link Regular Internet link Beamline data flow Globus Online - hosted service for high-speed, reliable, secure data movement Users
70. Four theses Ultrascale computing enables new problem-solving methods Research data management is an essential service like electricity and networking Economies of scale motivate highly aggregated computing and storage Automation of science processes accelerates discovery and yields competitive advantage
Trends: computers, storage, detectors, âŠItâs the ratios that matter: Cores/CPU, CPUs/computer, data/scientistExperiment and simulation
To show what I means, letâs look at the example of astronomy again.Tycho Brahe ⊠30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newtonâs law of gravitation.
To show what this means, letâs look at the example of astronomy once again.Tycho Brahe ⊠30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newtonâs law of gravitation.
Some allege that Kepler took unusual steps to acquire his data. Hopefully not so common.
Photographic plates ï We need computers!Here are some early computers in Harvard Observatory, around 1890.Computing the consequences of equations became a profession1 multiplication per 2 seconds, maybe, x 8 people, 4 multiplications per secondHowever, unreliable and hard to get to work more than 8 hours per day
By the late 1990s, in 5 years, imaged 230 million celestial objects, measuring the spectra of more than 1 million of them
âSlices through the SDSS 3-dimensional map of the distribution of galaxies. Earth is at the center, and each point represents a galaxy, typically containing about 100 billion stars. Galaxies are colored according to the ages of their stars, with the redder, more strongly clustered points showing galaxies that are made of older stars. The outer circle is at a distance of two billion light years. The region between the wedges was not mapped by the SDSS because dust in our own Galaxy obscures the view of the distant universe in these directions. Both slices contain all galaxies within -1.25 and 1.25 degrees declination.â
http://xrds.acm.org/article.cfm?aid=1836552
Sequencing volumes doubling every 4-6 months.Note the log scale!Bioinformatics cost is purely BLAST;values are in Amazon EC2Lessons: 1) Need computer scientists; 2) Need more hardware; 3) Need more collaboration on analysis.
In contrast, see SDSSâand also Google.VolumeDiversity and complexitySpeed of analysis
Research data management in 2011
Photon science recognizes the importance of computing.However, if we perform some simple textual analysis, we see that ~1% of the report talks about computing and data. 670 out of 50,676 wordsâ1.3%
Liz Lyon, U. BathâAssociate Director, UK Digital Curation CenterGeneric Data Acquisition (GDA) software developed at Daresbury initially, now at Diamond Light Source.
Chris Jacobsen
What about networking?Difficult to price, but many experts estimate a doubling time of 9 months for network capacity thanks to WDM and optical doping.10 Gbps per User ~ 100-1000x Shared Internet Throughput
Port Pricing is Falling Density is Rising â Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects
Chicago is an international networking hub
Chicago railroads, 1950 (http://www.encyclopedia.chicagohistory.org/pages/1774.html)
Motivated by enormous parallelism,massive data, complexityEnabled by networks
Whatâs this got to do with that cloud thing?Recall that âcloudâ is a term used to mean a few different things
Next question: Where does computing happen? Massive parallelism in computing and storage. Operations costs go up.Google data center in OregonNote also variation in cost of power: factor of 5
Interestingly, if we look at the situation in business, things are quite different.There is a similarly long list of time-consuming tasks. There is a large and growing SaaS industry that addresses many of them.If I start a business today, I can do it from a coffee shopâthere is no need to acquire and run any IT at all. I can outsource âŠ
Of course, people also make effective use of IaaS, but only for more specialized tasks
So letâs look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks âŠ
The result of this work is something called Globus Online. This is something new. Not just more of the same Globus Toolkit stuff.Globus Toolkit: hasnât changed. Been around 15 years. Still a toolkit for building custom Grids such as LHC, TeraGrid, ESG, BIRN, LIGO, etc.Globus Online: Focused on out sourcing the time-consuming activities associated with data transfer. Register, transfer, monitor, and customize endpoints.Globus Online is a full Web 2.0-based solution. That means a few different things. First, it is architected using REST principles: important elements are exposed as resources, on which operations can be performed using HTTP operations. These operations can be used directly, or via powerful AJAX Web GUIs.
The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But weâre talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But weâre talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
Under the covers: built as a scale-out web applicationHosted on Amazon Web ServicesReplicate state data over multiple storage servers.Dynamically scale number of VMs.
Explain attempts; a cornerstone of our failure mitigation strategyThrough repeated attempts GO was able to overcome transient errors at OLCF and rangerThe expired host certs on bigred were not updated until after the run had completed
3000 zebra fish mutants
Collect, move, store, index,analyze, share, update, iterate; millions of files;1000s of experiments