This document provides an overview of the Pacific Research Platform (PRP) after two years of operation. It describes several science drivers that are using the PRP, including biomedical research on cancer genomics and microbiomes, earth sciences like earthquake modeling, and astronomy. It highlights how the PRP is connecting sites like UC San Diego, UC Santa Cruz, UC Berkeley to share and analyze large datasets using high-speed networks. The PRP is expanding to support new areas like deep learning, cultural heritage projects, and connecting additional UC campuses through network upgrades.
Formation of low mass protostars and their circumstellar disks
PRP Two Years In: Linking Research Across California
1. “The Pacific Research Platform
Two Years In”
Welcome and Overview Talk
to the Pacific Research Platform “PRPv2” Workshop 2017
University of California, San Diego
February 21, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
2. Initially Proposed PRP Multi-Campus Science Driver Teams
• Biomedical
– Cancer Genomics Hub/Browser( UCSC/SDSC project over, connecting PRP to U. Chicago)
– Microbiome and Integrative ‘Omics (UCSD, Caltech, UCSF, UCD)
– Integrative Structural Biology (UCSF, NERSC, SDSC)
• Earth Sciences
– Data Analysis and Simulation for Earthquakes and Natural Disasters (Phase II)
– Climate Modeling: NCAR/UCAR (UCSD, NCAR)
– California/Nevada Regional Climate Data Analysis (UCI, UCSD, NCAR)
– CO2 Subsurface Modeling (SDSC)
• Particle Physics (UCD, UCI, UCSC, UCSD, others soon)
• Astronomy and Astrophysics
– Telescope Surveys (NERSC connected to PRP) (Phase II)
– Galaxy Evolution (UCI, UCSC) (Phase II)
– Gravitational Wave Astronomy (Caltech, UCSD)
• Scalable Visualization, Virtual Reality, and Ultra-Res Video (UCB, UCLA, UCM, UCSD)
3. 100 Gbps FIONA at UCSC Connects the UCSC Hyades Cluster
to the NERSC Supercomputer at LBNL
Supporting UCSC Remote Access
to Large Data Subsets
of the Dark Energy Spectroscopic Instrument (DESI)
and AGORA Galaxy Simulation Data
Produced at NERSC.
250 images per night
800GB per night
Shawfeng Dong, UCSC Cyberengineer
UCSC Feb 7, 2017
4. Global Scientific Instruments Will Produce Ultralarge Datasets Continuously
Requiring Dedicated Optic Fiber and Supercomputers
Square Kilometer Array Large Synoptic Survey Telescope
https://tnc15.terena.org/getfile/1939 www.lsst.org/sites/default/files/documents/DM%20Introduction%20-%20Kantor.pdf
Tracks ~40B Objects,
Creates 10M Alerts/Night
Within 1 Minute of Observing
2x40Gb/s
6. UC Merced’s VR CAVE:
Merced WAVE
• Transferring 5 CAVECam Images Over
10 Gbit/sec Fiber Path From UCSD to UC Merced:
– Total Data Size: 1.96 GBytes
– Transfer Took 2.17 seconds
– Transfer Rate: 924.49 MBytes/sec (~8Gbit/sec)
• This Transfer Would Have Taken:
– 21 Seconds Over 1Gbit/sec Connection
(Regular Ethernet)
– 5.35 Minutes Over 50Mbit/sec Connection
(Residential Internet)
7. PRP Will Link the Laboratories of
the Pacific Earthquake Engineering Research Center
http://peer.berkeley.edu/
8. The Second FIONette was Deployed at the PEER Facility at UC Berkeley,
and its Performance is Being Monitored
John Graham Installing FIONette at PEER Feb 10, 2017
9. Cancer Genomics Hub (UCSC) is Housed in SDSC:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
1G
8G
Data Source: David Haussler,
Brad Smith, UCSC
15G
Jan 2016
30,000 TB
Per Year
11. Newly Added PRP Multi-Campus Science Driver Teams
• Biomedical
– Cryo Electronic Microscopy (UCB/LLNL, UCD,UCLA, UCSD, UCSF)
– Bioinformatics (UCD)
– High-Resolution Microscopy (UCR, UCSD, NSCC)
• Computer Science and Engineering /Electrical and Computer Engineering, etc.
– JupyterHub (UCB, UCSD)
– Deep Learning (UCB, UCSD, UIC)
– Drones, Terrestrial Modeling/GIS (UCSD, UCM)
– Contextual Robotics (new)
• High Performance Wireless Research and Education Networks
– UCSD/SIO, UCI, UCR, UCM, CENIC, others tbd.
• Humanities and Social Sciences
– Preserving Cultural Heritage
12. PRP First Application: Distributed IPython/Jupyter Notebooks:
Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images
IJulia
IHaskell
IFSharp
IRuby
IGo
IScala
IMathics
Ialdor
LuaJIT/Torch
Lua Kernel
IRKernel (for the R language)
IErlang
IOCaml
IForth
IPerl
IPerl6
Ioctave
Calico Project
• kernels implemented in Mono,
including Java, IronPython, Boo,
Logo, BASIC, and many others
IScilab
IMatlab
ICSharp
Bash
Clojure Kernel
Hy Kernel
Redis Kernel
jove, a kernel for io.js
IJavascript
Calysto Scheme
Calysto Processing
idl_kernel
Mochi Kernel
Lua (used in Splash)
Spark Kernel
Skulpt Python Kernel
MetaKernel Bash
MetaKernel Python
Brython Kernel
IVisual VPython Kernel
Source: John Graham, QI
13. GPU JupyterHub:
2 x 14-core CPUs
256GB RAM
1.2TB FLASH
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
GPU JupyterHub:
1 x 18-core CPUs
128GB RAM
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
PRP UC-JupyterHub Backbone
UCB Next Step: Deploy Across PRP UCSD
Source: John Graham, Calit2
14. Cryo-electron Microscopy (cryo-EM)
Has Driven a “Resolution Revolution” in the Last Five Years
Exposure (every 60 seconds):
X & Y dimensions: 7420 x 7676 pixels
Frames per movie: 10 - 50
Size: 3 - 10 GB per movie
Every 24 hours:
Number of movies: ~1400
Data size: ~5 TB
Typical datasets:
Length of time: 2 - 6 days
Total size: 10 - 30 TB
Each Cryo-EM ‘Image’ is Actually a Movie
Source: Michael A. Cianfrocco,
Elizabeth Villa, & Andres Leschziner, UCSD
15. ~20 microscopes in CA
UCLA
UC Davis
UC Santa Cruz
SF Bay
UC Berkeley, LBNL,
UCSF, Stanford
San Diego
UCSD, TSRI, Salk
*
*
SDSC
NERSC
*Xstream
Using PRP to Connect Cryo-EM across California
With End Users and Computational Facilities
Long term:
‣ Partner with cryo-EM facilities to stream data
straight from microscopes (over PRP) to SDSC
‣ Perform all cryo-EM analysis (from micrographs
to 3D models) via web browser on SDSC
‣ Expand computing to other XSEDE resources
(e.g. Xstream)
Short term:
‣ Provide 2D and 3D analysis on particle stacks on
Comet at SDSC
Source: Michael A. Cianfrocco, UCSD3 supercomputer centers
cosmic-cryoem.org
16. UCD
UCSF
Stanford
NASA
AMES/
NREN
UCSC
UCSB
Caltech
USC UCLA
UCI
UCSD SDSU
UCR
Esnet
DoE Labs
UW/
PNWGP
Seattle
Berkeley
UCM
Los
Nettos
Internet2
Internet2
Seattle
Note: This diagram represents a subset of sites and connections.
* Institutions with
Active Archaeology Programs
“In an ideal world –
Extremely high bandwidth to
move large cultural heritage
datasets around the PRP cloud for
processing & viewing in CAVEs
around PRP with Unlimited Storage
for permanent archiving.”
-Tom Levy, UCSD
PRP is NOT Just for Big Data Science and Engineering:
Linking Cultural Heritage and Archaeology Datasets
Building on CENIC’s Expansion
To Libraries, Museums,
and Cultural Sites
17. Linking Libraries at UCB, UCLA, UCM and UCSD with CAVE
Kiosks
48 Megapixel CAVE Kiosk for UCSD
Library
UCSD Library Review, June 24 Megapixel UCM Library
Installation, July
18. PRP Backbone Sets Stage for 2017 Expansion
of HPWREN, Connected to CENIC, into Orange and Riverside Counties
• PRP CENIC 100G Link
UCSD to SDSU
– DTN FIONAs Endpoints
– Data Redundancy
– Disaster Recovery
– High Availability
– Network Redundancy
• Anchor to CENIC at UCI
– PRP FIONA Connects to
CalREN-HPR Network
– Data Replication Site
• Potential Future UCR
CENIC Anchor
UCR
UCI
UCSD
SDSU
Source: Frank Vernon,
Greg Hidley, UCSD
Hinweis der Redaktion
Campus Cyberinfrastructure – Network Infrastructure and Engineering (CC-NIE)
Campus Cyberinfrastructure – Infrastructure, Innovation, and Engineering (CC-IIE)
Campus Cyberinfrastructure – Data, Networking, and Innovation (CC-DNI) NSF 15-534 incorporates Data Infrastructure Building Blocks (CC-DNI-DIBBs) – Multi-Campus / Multi-Institution Model Implementation from Program Solicitation NSF 14-530
Campus Cyberinfrastructure – Network Infrastructure and Engineering (CC-NIE)
Campus Cyberinfrastructure – Infrastructure, Innovation, and Engineering (CC-IIE)
Campus Cyberinfrastructure – Data, Networking, and Innovation (CC-DNI) NSF 15-534 incorporates Data Infrastructure Building Blocks (CC-DNI-DIBBs) – Multi-Campus / Multi-Institution Model Implementation from Program Solicitation NSF 14-530
We already have 11 major research universities in California poised to partner.