1. eScience 2050
A Look back from the midpoint of the century
Dennis Gannon, Professor Emeritus
School of Informatics, Computing and Engineering
Indiana University
& Microsoft Research (retired)
2. The Problem with Predictions
• Our vision of the future may look too much like today
Some are obviously way off. Others not so much!
3. Extrapolating eScience from 1965 to 2019
• My first computer program.
One Fortran line per card.
High school was fun!
• How might I have predicted computing
50 years ahead?
Programs and data stored on
a paper tape a mile long!!
4. Where was computing in the 1960s ?
• 1961 -The programming language FORTRAN IV was created
• 1964 - IBM introduced its System/360.
• 1965 - Gordon Moore creates “Moore’s Law”
• 1967 - IBM created the first floppy disk.
• 1969 - The ARPAnet (which became Internet).
• Mostly used for email until 1980.
• 1970 - Edgar Codd invented Relation Database Idea.
• Not implemented until 1974.
5. The 30 years after 2019
• The cloud and supercomputing merge.
• Quantum computing as a service in the
cloud.
• DNA data storage in the cloud.
• Neuromorphic computing.
• The explosion of AI as an eScience enabler.
Huh?
6. The Cloud Supercomputer Convergence
• The converged supercloud merges best of both
• Support for thousands of concurrent interactive users
• Planet scale data resources supporting on-line personal agents and science gateways
• Capable of launching multiple exascale parallel computations
• By 2019 moving there already
• Google TPU accelerators and Microsoft Azure mesh network of FPGA engines
• US NSF Supercomputer centers already exploring cloud tech such as Kubernetes
Google TPUs Azure FPGA mesh on top of servers
7. Quantum in the Cloud
• In 2018 small programmable quantum processors
• Attached to clouds from IBM and Google
• Programmable!
• 200 stable qubits became
real in 2030.
8. DNA Storage and Neuromorphic computing
• Long history of research in DNA
• Longevity and density amazing
• By 2019 able to fully automate
storage and retrieval
• UW+MSR working on microfluidics
devices.
• New ways in which DNA encoded data
could be searched and structured in
ways like relational databases
• By 2050 the standard for long term
cloud storage.
• Neuromorphic research moving fast
• In 2019 Intel released “Pohoiki Beach”
– a 64-Loihi Chip Neuromorphic system
capable of simulating eight million
neurons
9. ML & AI becomes a standard tool of eScience
• Long history of Machine Learning in eScience
• By 2017 Generative Neural Networks
• GANs and Vars: Used to generate fakes.
10. Also useful for Science
• Applications in Astronomy, Biology,
Cosmology, …
From Shahar Harel and Kira Radinsky
“Prototype-Based Compound Discovery
using Deep Generative Models”
Mustafa Mustafa, et. al. “Creating Virtual Universes
Using Generative Adversarial Networks"
Synthetic Galaxies
11. The rise of Probabilistic Programming Languages
• An important new tool for eScience
• To make Bayesian inference about random behaviors that give rise to
experimental outcomes
• inferring the masses of subatomic particles based on the results of collider
experiments,
• or inferring the distribution of dark matter from the gravitational lensing effects on
nearby galaxies,
• or unravelling complex models of gene expression that manifest as disease
Random
draws
Simulation code
X1
X2
X3
…
Xn
y1
y2
y3
…
yn
P( Y | X )
Compiled Inference code
X1
X2
X3
…
Xn
y1
y2
y3
…
yn
P( X | Y )
Observed
Results
PPL Inference Compiler
Check out Gen from MIT and PyProb from Oxford
12. When does Machine Learning
Become AI?
• Deep Neural Nets allow us to magnify our
senses.
• Look at thousands of things faster and see details
better
• Do language translation in real time
• Reinforcement Learning works for closed
world games like Go, chess and Pong.
Huh?
• “Deep Learning Isn't a dangerous Magic Genie. It’s just math.”
• Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence
• What about bots and “smart speakers”?
13. From Alexa to a tool for eScience Research
• 2019 Smart speaker bots are really dumb
• but they work … for some things.
• How about for eScience tasks?
• Alexa, please find the metadata associated with
experiment 32 and then compare it to the NIH
standard. What are the important differences?
• Please read the recent papers on the evolution of dark
matter halos? Are there simulation codes available?
• Recall that I asked you to look for any work in the
biology community that uses methods similar to that
astrophysics work. Any results yet?
14. Raj Reddy’s “Cognition Amplifier”
• I call it Research Assistant.
• Lives in the cloud and catalogs all my papers,
notes, codes, experimental results.
• “Knows” what topics are important and seeks out related research.
• Creates smart summaries that encapsulate key ideas
• Monitors ongoing computational experimental workflows
• Proposes new experiments to test my theories
• Verifies mathematical analysis
15. A Toy Example
• Basically a metasearch tool
• Does voice to text
• Text parsed via cloud service
• Analysis extracts key actions from parsed text
• Search from Bing, Wikipedia or arXiv
• Wikipedia summary rendered as voice by amazon lex.
voice
text
text
parsed text
Web browser
text
Amazon
Lex
text
voice
Analysis Engine
Google voice Kit
Wikipedia
API
Bing
search
Cornell
arXiv
16.
17. How close are we to making this real in 2019?
• Need a Personal Knowledge Graph
• Document oriented … Wikidata-like
• Google KG key to search
• Start with “general science” KG
• RA auto extends it as you use it.
18. How about deep analysis of research papers?
• Text “comprehension” requires
• Strong language model
• Have that now with BERT transformers
• Knowledge graph
• Abductive inference engine
• Big Challenges being worked on
now
• Relating technical diagrams and
equations to text.
• Can you classify documents by the
content of the mathematics used?
• Can you derive theory from
observation?
Their Aristo system recently got an “A” on the N.Y.
Regents 8th grade science exams
19. Progress …
• Hanalyzer (short for high-throughput analyzer) uses natural language processing
to automatically extract a semantic network from all PubMed papers relevant to a
specific scientist
• Eureka (now DataRobot) does automatic AI based time series analysis and
DataRobot is a tool for automatically building ML models given only data.
• Michael Schmidt and Hod Lipson, Distilling Free-Form Natural Laws from
Experimental Data. (SCIENCE VOL 324 3 APRIL 2009)
20. Conclusions
• The range of the tech revolution between 1960 and 2020 was huge.
• It made eScience possible
• The developments from 2020 to 2050 will be just as surprising.
• Clouds and Supercomputers merge
• Quantum because a standard attached accelerator
• DNA storage changes the dynamics of data science
• AI becomes our research assistant.
• eScience becomes Science.