Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman, and I put together a white paper on "Building a National Discovery Cloud" for the Computing Community Consortium (http://cra.org/ccc). I presented these slides at a Computing Research Association "Best Practices on using the Cloud for Computing Research Workshop" (https://cra.org/industry/events/cloudworkshop/).
Abstract from White Paper:
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications.
1. Crescat scientia; vita excolatur
A National Discovery Cloud
Ian Foster
The University of Chicago
Argonne National Laboratory
https://cra.org/ccc/wp-content/uploads/sites/2/2021/04/CCC-Whitepaper-National-Discovery-Cloud-2021.pdf
foster@uchicago.edu, @ianfoster
2. Tools for augmenting human intellect: 1962
“By ‘augmenting human intellect’
we mean increasing the capability
of a [person] to approach a
complex problem situation, to
gain comprehension to suit [their]
particular needs, and to derive
solutions to problems.” *
* Doug Engelbart, 1962 -- https://www.dougengelbart.org/content/view/138/
3. Tools for augmenting human intellect: 1962
“By ‘augmenting human intellect’
we mean increasing the capability
of a [person] to approach a
complex problem situation, to
gain comprehension to suit [their]
particular needs, and to derive
solutions to problems.” *
* Doug Engelbart, 1962 -- https://www.dougengelbart.org/content/view/138/
+ https://www.theregister.com/2008/12/11/engelbart_celebration/
"I don't get it - everything you've
shown me today I can do on my
ASR-33.” – prominent prof, as
reported by Andries Van Dam +
4. Numerical simulation
A scientific method on par
with, and sometimes
exceeding, experiment
Public cloud
A new computing platform
enabling new approaches to
building, delivering services
2022: Three transformative technologies
Sensors, data, ML
Powerful methods for
generating, and extracting
information from, huge data
x x
Sources: servecentric.com, visibleearth.nasa.gov, quantamagazine.org, e3sm.org
5. Challenge and opportunity in 2022:
Create new tools for augmenting human intellect
• Curated collections of observational, experimental,
and simulated data, plus derived ML models
• A global knowledge graph linking publications,
data, models—updated by computational agents
• Digital twins of complex systems, running on
powerful computers, plus ML surrogates
• Rich set of science services, with infrastructure to
simplify operations and incentives to sustain
See DOI: 10.1126/science.1110411, 2005, substituting “cloud” for “grid”
6. Challenge and opportunity in 2022:
Create new tools for augmenting human intellect
• Curated collections of observational, experimental,
and simulated data, plus derived ML models
• A global knowledge graph linking publications,
data, models—updated by computational agents
• Digital twins of complex systems, running on
powerful computers, plus ML surrogates
• Rich set of science services, with infrastructure to
simplify operations and incentives to sustain
See DOI: 10.1126/science.1110411, 2005, substituting “cloud” for “grid”
Such tools are
needed across all
of research and
education
The CS community
should be:
• leading design
• creating tools for
CS-specific needs
7. Operated by UChicago for researchers worldwide
Made possible by the support of 150+ subscribers globus.org
Science services
Hosted
on
8. Heatmap and clustering of the
occurrence of Corynebacteria
in study mgp128
doi: 10.1093/bib/bbx105
ttps://www.mg-rast.org
9. A National Discovery Cloud requires new capabilities
• The definition, creation, and curation of large reference datasets to fuel new
data-driven models of the natural world, economy, human physiology, healthcare
system, manufacturing processes, etc.
• A discovery cloud platform to enable the collaborative development of value-
added services that support NDC-powered scholarship and education
• New educational programs and curricula to prepare a generation for whom
programming and using NDC capabilities is second nature
• Substantial computing, storage, and network resources to host and compute
over enormous datasets, and to host and operate discovery cloud services that
enhance the value of datasets
• Innovative integrations of NDC capabilities with high-performance computers,
automated laboratories, and other elements of a 21st century discovery and
innovation ecosystem
• Privacy and security designed in from the beginning, rather than added post
facto, and with integrated assurances and audit capabilities so that the NDC
advances rather than hinders computing in the public interest
10. Open issues and challenges include …
• Weaving diverse capabilities just listed into a coherent whole that US
R&D enterprise can harness for discovery, innovation, and workforce
• Balancing needs for persistent resources to support R&E communities
vs. supporting innovation by those communities
• Enabling research at lower levels of the ‘stack’ (Touch’s Law: The
lowest level at which research is permitted in a testbed is also the
highest level at which it can occur)
• Privacy and security: Balancing “free and open” vs. “private and
secure” in data and services
• Building an NDC that contributes to environmental sustainability
• Appropriate balance between bespoke and private sector data centers
11. Summary: Let’s not underestimate public cloud
• An elastic source of computing and storage capacity – sure
• A cheap source of computing and storage capacity – maybe/not?
• A new technology to study and engineer – yes
• An immensely powerful platform for delivering scalable, reliable, and
democratizing digital services – absolutely!
• Our opportunity and challenge is a top-to-bottom rethink of what
computing means for research and education
Hinweis der Redaktion
A National Discovery Cloud: Preparing the US for Global Competitiveness in the New Era of 21st Century Digital Transformation
Ian Foster, Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications.
Our message is: think big. We have a historic opportunity to rethink how computing is organized and applied to empower the research community
Look back 60 years to another historic moment, when Doug Engelbart demonstrated a computer system designed to enhance human abilities to tackle complex problems
His demonstration of what he called NLS showed how by integrating emerging technologies of the day (displays and telecommunications) and inventing new ones (mouse, multiple windows) one could provide entirely new capabilities.
While we now see this demo as a defining moment in computing history, at the time 90% of the community thought he was a crackpot.
Today, humanity faces yet bigger problems, but we also have access to technologies hardly imagined by Engelbart. I’d like to highlight three:
1) A new computing platform that makes it possible for anyone to create and deploy powerful digital services for use by tens or tens of millions
2) Sensors that can acquire enormous datasets, and powerful methods, including ML methods, that can extract information from those data
3) Simulation methods that can …
Opportunity is not just to provide increased access to computing power,
But to create new digital services that enhance human capabilities
A first example of an advanced digital service. Developed with NSF and DOE support over more than a decade.
Links storage resources at more than 1600 institutions.
Used by 10,000s of users to manage and share large data.
Two key points:
-- Leverages public cloud (AWS) to run a powerful, scalable national-scale service
-- Sustained by subscriptions from more than 150 institutions worldwide
A second example. Discipline-specific: Used by 10,000s to process metagenomic data from environmental samples. Has transformed this field by allowing biological scientists without informatics resources or expertise to participate in metagenomics research, AND permitting large meta-metagnomics studies.
Difficulties: NOT hosted on cloud; no sustainability model
In the CCC white paper, we speak to the issues that I have already mentioned, and emphasize that to realize these new tools for augmenting human intellect, we need a range of new capabilities, including those listed here.
New capabilities and leadership are required if the US research and education enterprise is to effectively harness this new computational fabric for discovery, innovation, and workforce. The challenge is to enable researchers, educators, students, and industrial collaborators to develop and use the value-added services that will underpin the society of tomorrow; aggregate the massive datasets required for AI-driven discoveries and innovation; and construct and run the simulation models used to understand future products and scenarios.
The big opportunity is to transform science processes, much as business and consumer relationships with IT are being transformed
Doing this right will require a top-to-bottom rethink of what computing means for science; how it should be delivered; how it should be funded; how contributors should be rewarded