The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
Building the Pacific Research Platform: Supernetworks for Big Data Science
1. “Building the Pacific Research Platform:
Supernetworks for Big Data Science”
Steve Jones Internet Lecture
The 67th Annual Conference of the International Communication Association
San Diego, CA
May 26, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. Abstract
In every field we see an exponential rise of Big Data, which in turn is demanding new
technological solutions in visualization, machine learning, and high performance
cyberinfrastructure. The rise of artificial intelligence will both be powered by these
developments and be essential for deriving understanding from the tsunami of data. I will
describe how my NSF-funded Pacific Research Platform, which provides an Internet
platform with 100-1000 times the bandwidth of today's commodity Internet to all the
research universities on the West Coast, is being designed from the application needs of
researchers from particle physics to climate to human health. Even fields like
archaeology, digital libraries, and social media analysis are engaged.
3. The Defining Issue in IT for the Coming Decades:
Machine Intelligence Coupled to Massive Data
May 5, 2015August 25, 2015
4. Traffic Control for Autonomous Drone Air Delivery
is Under Development by NASA, Amazon, & Google
5. Self-Driving Cars From Multiple Companies
Use Advanced Sensors Coupled to Realtime Computing
7. Streaming Data From the Tesla Fleet Trains Self-Driving Algorithms:
The “Hive-Mind”
Note: Google
Self-Driving Cars
Have Only Driven
1.5 Million Miles
8. The Planetary-Scale Computer Fed by a Trillion Sensors
Will Drive a Global Industrial Internet
www-bsac.eecs.berkeley.edu/frontpagefiles/BSACGrowingMEMS_Markets_%20SEMI.ORG.html
Next Decade
One Trillion “Within the next 20 years
the Industrial Internet
will have added
to the global economy
an additional $15 trillion.”
--General Electric
www.ge.com/docs/chapters/Industrial_Internet.pdf
9. How Can We Build an Academic Cyberinfrastructure
to Enable Collaborative Teams to Discover Patterns From Big Data?
10. We Have Been Working Toward the Pacific Research Platform for 15 Years:
OptIPuter, Quartzite, Prism
PI Papadopoulos,
Co-PI Smarr
2013-2015
PI Smarr,
Co-PI DeFanti
Co-PI Papadopoulos
2002-2009
PI Papadopoulos,
Co-PI Smarr
2004-2007
11. Giving Individual Researchers Optical Fibers
To Create an On-Campus Big Data Freeway System
NSF CC-NIE Prism@UCSD
Phil Papadopoulos, SDSC, Calit2, PI
CHERuB
UCSD’s 30,000+
Internet Users
Travel Over
One 10Gbps Fiber
12. PRISM is Connecting CERN’s CMS Experiment
To UCSD Physics Department
80 Gbps PRISM Connection Has Been Made
13. Big Data Science Data Transfer Nodes -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
FIONette—1G, $1,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
John Graham, Calit2
14. How Prism Optical Network Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
Knight Lab
FIONA
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
15. NSF Has Funded Over 100 Campuses
to Build On-Campus Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
16. Logical Next Step: The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
17. PRP’s First 1.5 Years:
Connecting Campus Application Teams and Devices
18. Cancer Genomics Hub (UCSC) is Housed in SDSC:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
1G
8G
Data Source: David Haussler,
Brad Smith, UCSC
15G
Jan 2016
30,000 TB
Per Year
19. 40G FIONAs
20x40G PRP-connected
WAVE@UC San Diego
PRP Now Enables
Distributed Virtual Reality
PRP
WAVE @UC Merced
Transferring 5 CAVEcam Images from UCSD to UC Merced:
2 Gigabytes now takes 2 Seconds (8 Gb/sec)
20. The Prototype PRP Has Attracted
New Application Drivers
Scott Sellars, Marty Ralph
Center for Western Weather and Water Extremes
Frank Vernon - Expansion of HPWREN
Tom Levy, Cultural Heritage
Cryo EM
21. Director: F. Martin Ralph Website: cw3e.ucsd.edu
Big Data Collaboration with:
Source: Scott Sellers, CW3E
Collaboration on Atmospheric Water in the West
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu
22. Calit2’s FIONA
SDSC’s COMET
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
GPUsGPUs
Complete workflow time: 20 days20 hrs20 Minutes!
UC, Irvine UC, San Diego
Improvement of Over 1000x With PRP
23. Linking Cultural Heritage and Archaeology Datasets
at UCB, UCLA, UCM and UCSD with CAVEkiosks
48 Megapixel CAVEkiosk
UCSD Library
48 Megapixel CAVEkiosk
UCB Library
24 Megapixel CAVEkiosk
UCM Library
24. Expanding to National Research Platform
and Global Research Platform
PRP’s Current
International
Partners
25. Now that PRP Can Move Big Data Quickly,
Next Step is to Add Machine Learning
26. What is the Cyberinfrastructure Needed
For The World of Autonomous Machines?
• Supernetworks Connecting Big Data to GPU-Cloud for Training AI Nets
• Trained Neural Nets Downloaded onto Robots
• Robots Use Neural Nets to Navigate with Real-Time Data Streams
• Swarm Input to Update Training on Neural Nets
27. Plans for ~500 Game GPUs Deployed on the Pacific Research Platform
Devoted to Machine Learning
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
High Speed “Cloud” of 320 GPUs
for Training AI Algorithms on Big Data
SunCAVE 70 GPUs
48 GPUs
for Applications
48 GPUs
for Students
FIONA with
8-Game GPUs
28. For ¾ of a Century, Computing Has Relied
on von Neumann’s Architecture
29. The Future of Supercomputing Will Blend Traditional HPC and Data Analytics
Integrating Non-von Neumann Architectures
“High Performance Computing Will Evolve
Towards a Hybrid Model,
Integrating Emerging Non-von Neumann Architectures,
with Huge Potential in Pattern Recognition,
Streaming Data Analysis,
and Unpredictable New Applications.”
Horst Simon, Deputy Director,
U.S. Department of Energy’s
Lawrence Berkeley National Laboratory
30. UC San Diego Creates
Center for Brain Activity Mapping
http://ucsdnews.ucsd.edu/feature/uc_san_diego_creates_center_for_brain_activity_mapping
From left, Nick Spitzer, Ralph Greenspan, and Terry Sejnowski.
Photos by Erik Jepsen/UC San Diego Publications
May 16, 2013
31. Reverse Engineering of the Brain:
Large Scale Microscopy of Mammal Brains Reveals Complex Connectivity
Source: Rat Cerebellum Image, Mark Ellisman, UCSD
Neuron
Cell Bodies
Neuronal Dendritic
Overlap Region
32. Realtime Simulation of Human Brain Possible
Within the Next Ten Years With Exascale Supercomputer
Horst Simon, Deputy Director,
Lawrence Berkeley National Laboratory’s
National Energy Research Scientific Computing
Center
Fastest
Supercomputer
Trend Line
Tianhe-2
33. The Rise of Brain-Inspired Computers:
Left & Right Brain Computing: Arithmetic vs. Pattern Recognition
Adapted from D-Wave
34. Brain-Inspired Processors
Are Accelerating the Non-von Neumann Architecture Era
“On the drawing board are collections of 64, 256, 1024, and 4096 chips.
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
36. AI is Advancing at an Unprecedented Pace:
Deep Learning Algorithms Working on Massive Datasets
1.5 Years!
Training on 30M Moves,
Then Playing Against Itself
Google Used TPUs to Achieve the Go Victory
37. Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab
For Machine Learning on Non-von Neumann Processors
“On the drawing board are collections of 64, 256, 1024, and 4096
chips.
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
UCSD ECE Professor Ken Kreutz-Delgado Brings
the IBM TrueNorth Chip
to Start Calit2’s Qualcomm Institute
Pattern Recognition Laboratory
September 16, 2015
38. Contextual Robots Need Low Energy Neuromorphic Processors That
Can See and Learn Wirelessly Tied Into the Planetary Cloud Computer
Professor Tajana Rosing
39. Calit2 Has Students Creating 3D Printed Drones
Deploying Trained Neural Nets on Non-von Neumann Processors
40. DOD: “Perdix drones share one distributed brain for decision-making,
adapting to each other like swarms in nature.”
42. This Next Decade’s Computing Transition
Will Not Be Just About Technology
"Those disposed to dismiss
an 'AI takeover' as science
fiction may think again after
reading this original and
well-argued book." —Martin
Rees, Past President, Royal
Society
If our own extinction is
a likely, or even possible,
outcome of our
technological
development, shouldn't we
proceed with great
Success in creating AI would be
the biggest event in human
history. Unfortunately, it might
also be the last, unless we learn
how to avoid the risks.
– Steven Hawking
43. Our Support:
• US National Science Foundation (NSF) awards CNS 0821155 and
CNS-1338192, CNS-1456638, ACI-1540112, and ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet