SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Downloaden Sie, um offline zu lesen
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Collaborative Data Science in a Highly Networked World
İlkay ALTINTAŞ, Ph.D.
Chief Data Science Officer, San Diego Supercomputer Center
Division Director, Cyberinfrastructure Research, Education and Development
Founder and Director, Workflows for Data Science Center of Excellence
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What is a network
useful for?
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Making connections
• People and communities
• Data and applications
• People and information
• People and services
• Learners and classes
• Ideas and masses
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Advancing
Communication
and
Collaboration
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Any technology and
application built on
networking should be built
around these concepts.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we conduct and
teach data science in a
highly networked world?
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What is Data Science?
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Ultimate Goal
BigData
Insight
Action
Data Science
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How does successful data science happen?
Insight Data Product
“Big” Data
Question
Exploratory
Analysis
and
Modeling
Insight
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Customer
Demographic
Previous
Purchases
Book reviews
What kind of
books does this
customer like?
Book
recommendations
Example: Book Recommendations
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Model of customer’s
book preferences
New book
information
Who is likely to
like this book?
Find Potential Audience for a New Book
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Action to market
the book to the
right audience
Who is likely to
like this book?
Market a New Book
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Action to market
the book to the
right audience
Who is likely to
like this book?
Insight Action
Market a New Book
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Historical data Near real-time data
Prediction
Creating Actionable Information
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Prediction
Action
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Why is the increased interest
in Data Science?
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
+
Big Data
Scalable Computing
Anywhere Anytime
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data Science Today is Both a Big Data and a Big Compute Discipline
BIG DATA
COMPUTING AT
SCALE
Enables dynamic data-driven applications
Smart Manufacturing
Computer-Aided Drug Discovery
Personalized Precision Medicine
Smart Cities
Smart Grid and Energy Management
Disaster Resilience and Response
Requires:
• Data management
• Data-driven methods
• Scalable & dynamic
process coordination
• Resource optimization
• Skilled interdisciplinary
workforce
New era of
data science!
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Nearly every problem today is
transformed by big data.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Example: Geospatial Big Data
• Flood of new data sources and types
• Needs new data management, storage and analysis
methods
• Too big for a single server, fast growing data volume
• Requires special database structures that can handle
data variety
• Too continuous for analysis at a later time, with
increasing streaming rate, i.e., velocity
• Varying degrees of uncertainty in measurements, and
other veracity issues
• Provides opportunities for scientific understanding at
different scales more than ever, i.e., potential high value
Real-time sensors
Weather forecast
Satellite imagery
Sea Surface Temperature
Measurements
Drone imagery
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Example: Biomedical Big Data http://nbcr.ucsd.edu
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
1021
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we amplify the value of Big Data?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we find the connections
and answer questions that
benefit the society?
“We are drowning in
information and
starving for knowledge”
– John Naisbitt
Source: Megatrends, 1982
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Create an Ecosystem that Enables
Needs and Best Practices
• data-driven
• scalable
• dynamic
• process-driven
• collaborative
• accountable
• reproducible
• interactive
• heterogeneous
• includes many different expertise
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What would such an
ecosystem look like?
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
A Typical Collaborative Data Science Ecosystem
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Amplifying the
Value of Data
Related to X
Benefit Y for
Science,
Business,
Society or
Education
What if X was wildfires?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Collaborative Networked Science
for Wildfires
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we Better Predict Wildfire Behavior?
• Wildfires are critical for ecology, but volatile
• Fuel load is high due to fire suppression over the
last century
• Drought, higher temperatures
• Better prevention, prediction and maintenance of
wildfires is needed
Photo of Harris Fire (2007) by former Fire Captain Bill
Clayton
Disaster management of (ongoing) wildfires heavily relies on
understanding their Direction and Rate of Spread (RoS).
Fire is Part of the Natural Ecology….
… but requires Monitoring, Prediction and Resilience
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Big Data Fire Modeling
Visualization
Monitoring
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic
Prediction and Resilience Cyberinfrastructure for Wildfires
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
A dynamic system integration of
real-time sensor networks, satellite imagery, near-real
time data management tools, wildfire simulation tools,
and connectivity to emergency command centers
.
…. before, during and after a firestorm.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Videoavailableat:
https://www.youtube.com/watch?v=N4LAROiW5c8&t=2s
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
High Performance Wireless
Research and Education Network FARSITE
http://hpwren.ucsd.edu/cameras
>160 Meteorological Sensors and Growing
Major success to bring
internet to incident
command in the field. Used
in over 20 fires over time.
Most popular
operational fire
behavior
modeling system.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Closing the Loop using Big Data
-- Wildfire Behavior Modeling and Data Assimilation --
• Computational costs for existing
models too high for real-time
analysis
• a priori -> a posteriori
• Parameter estimation to make
adjustments to the (input) parameters
• State estimation to adjust the
simulated fire front location with an a
posteriori update/measurement of the
actual fire front locationConceptual Data Assimilation Workflow with
Prediction and Update Steps using Sensor Data
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Fire Modeling Workflows in WIFIRE
Real-time sensors
Weather forecast
Fire perimeter
Landscape data
Monitoring &
fire mapping
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Firemap Tool
• A web-based GIS environment:
• access information related to fire
behavior
• analyze what-if scenarios
• model real-time fire behavior
• generate reports
• Powered by WIFIRE
Firemap
Web Interface
WIFIRE Data Interfaces WIFIRE Workflows
Computing Infrastructure
http://firemap.sdsc.edu
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data-Driven Fire Progression
Prediction Over Three Hours
Collaboration with LA and
SD Fire Departments
http://firemap.sdsc.edu
August 2016 – Blue Cut Fire
Tahoe and Nevada Bureau
of Land Management
Cameras: 20 cameras added
with field-of-view
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
CA Fires 10/2017 through 12/2017
800K+ unique visitors and 8M+ hits
http://firemap.sdsc.edu
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
San Diego Airborne Intelligence
Reconnaissance System (SDAIRS)
Lilac Fire Perimeter and
WIFIRE Fire Progression
Model in SCOUT
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Thomas Fire: 12/04/2017- 01/12/2018
December 10, 2017
December 17, 2017
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Real-time Satellite Detections During
Thomas Fire: 12/04/2017- 01/12/2018
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Some Machine Learning Case Studies
• Smoke and fire perimeter detection based on imagery
• Prediction of Santa Ana and fire conditions specific to location
• Prediction of fuel build up based on fire and weather history
• NLP for understanding local conditions based on radio
communications
• Deep learning on multi-spectra imagery for high resolution fuel maps
• Classification project to generate more accurate fuel maps (using
Planet Labs satellite data)
All require periodic,
dynamic and
programmatic
access to data!
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Classification project to generate more
accurate fuel maps
• Accurate and up-to-date fuel maps are critical for
modeling wildfire rate of speed and potential burn
areas.
• Challenge:
• USGS Landfire provides the best available fuel maps
every two years.
• The WIFIRE system is limited by these potentially 2-year
old inputs. Fuel maps created at a higher temporal
frequency is desired.
• Approach:
• Using high-resolution satellite imagery and deep
learning methods, produce surface fuel maps of San
Diego County and other regions in Southern California.
• Use LandFire fuel maps as the target variable, the
objective is create a classification model that will
provide fuel maps at greater frequency with a measure
of uncertainty.
Cluster 1: Short Grass
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
WIFIRE Team: It takes a village!
• PhD level researchers
• Professional software
developers
• 29 undergraduate students
• UC San Diego
• UC Merced
• MURPA University
• University of Queensland
• 1 high school student
• 5 MSc and 5 MAS students
• 2 PhD students (UMD)
• 1 postdoctoral researcher
• Partners from fire departments
• Advisory board with diverse
expertise and affiliations
UMD - Fire modeling
UCSD MAE - Data assimilation
SDSC -
Cyberinfrastructure,
Workflows,
Data engineering,
Machine Learning,
Information Visualization,
HPWREN
Calit2/QI-
Cyberinfrastructure, GIS,
Advanced Visualization,
Machine Learning,
Urban Sustainability,
HPWREN
SIO - HPWREN
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
ACQUIRE PREPARE ANALYZE REPORT ACT
Focus on the Process and Team Work
to Answer a Question
…
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Scalable Drug Discovery
medium
Prima-1
Sticticacid
35ZWF
25KKL
22LSV
32CTM
26RQZ
27WT9
33AG6
33BAZ
28NZ6
27TGR
27VFS
35LWZ
36EB5
27UDP
32LDE
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
no p53
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
1.4"
no
com
poundPrim
a-1
35ZW
F25KKL25PW
S24M
LP26YYG22LSV24M
NR32CTM
22KTV24M
Y424LBC24NPU24NW
3
Series1"
Series2"
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
1.4"
no
com
poundPrim
a-1
35ZW
F25KKL25PW
S24M
LP26YYG22LSV24M
NR32CTM
22KTV24M
Y424LBC24NPU24NW
3
Series1"
Series2"cancer cell with p53-R175H mutant
cellproliferation
15 new reactivation compounds
reactivation
compounds kill
cells with p53
cancer mutant
Ieong et al., 2014
AMBER GPU
MD Tool
Minimization Actor
BENEFITS:
• Increase reuse
• Reproducibility
• Scale execution,
problem & solution
• Compare methods
• Train students
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Using workflows for process integration…
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Networked Science Workflows
– Early Examples –
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
2004,
ROADNet
Project
50
ORB
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Real-time Stream Processing 2005,
ROADNet
Project
Laser Strainmeter Channels in;
Scientific Workflow;
Earth-tide signal out
Straightforward Example:
Seismic Waveforms
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Sample Variance Plotting and Storage
Workflow for Real-time Data
2006,
ROADNet
Project
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Workflows for Data Science
Center of Excellence at SDSC
Goal: Methodology and tool
development to build automated
and operational workflow-driven
solution architectures on big data
and HPC platforms.
Focus on the
question,
not the
technology!
Real-Time Hazards Management
wifire.ucsd.edu
Data-Parallel Bioinformatics
bioKepler.org
Scalable Automated Molecular Dynamics and Drug Discovery
nbcr.ucsd.edu
WorDS.sdsc.edu
• Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse and reproducibility
• Save time, energy and money
• Formalize and standardize
• Train
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Balance of:
• team building
• process management
• performance optimization
• provenance tracking
• training and education
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
While working with experts on…
• domain expertise
• data modeling and integration
• data management services
• analytical methods
• communication and visualization
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
“The” Data Science Team
• Data engineer
• Data analyst
• Methods expert
• Scalability and operations expert
• Business manager
• Business analyst
• Scientist
• Visualization and dashboard developer
• Solution architect
• Story teller/coordinator
• Project manager
Expertise and skills often overlap,
but nobody has it all!
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How can I get smart people
to collaborate and
communicate?
…to utilize data and infrastructure to
generate insights and solve a question.
Focus on the
question,
not the
technology!
Team Building
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Purpose to Lead to Insight
Focus on the
question,
not the
technology!
Purpose
LEAN METHOD
Minimize the
total time through the loop
CODE
LEARN BUILD
MEASURE
DATA
IDEAS
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data Science Process
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
ACQUIRE PREPARE ANALYZE REPORT ACT
Basic Steps
in a Data
Science
Process
• Import raw dataset into your analytics
platform
• Explore & Visualize
• Perform Data Cleaning
• Feature Selection
• Model Selection
• Analyze the results
• Present your findings
• Use them
ACQUIRE
PREPARE
ANALYZE
REPORT
ACT
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Computational Data ScienceData Engineering
ACQUIRE PREPARE ANALYZE REPORT ACT
Scale Scale Scale Scale
Many iterations and rollbacks between steps.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Build
Explore
Scale
Report Act
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Computational Data ScienceData Engineering
ACQUIRE PREPARE ANALYZE REPORT ACT
Scale Scale Scale Scale
Programmability
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Process for Practice of
Data Science
Programmability
Ease of use, iteration, interaction, re-use, re-purpose
Scalability
From local experiments to large-scale runs
Reproducibility
Ability to validate, re-run, re-play
Data
Product
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Some P’s in PPoDS
Platforms
Process
People
Problem
or
Purpose
?
Programmability
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
The insights need to be evaluated to
turn them into action.
Platforms
Process
People
Purpose?
Programmability
Metrics Product
Insight
Action
?
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Pod è sub-process
Treat Each Step in the Solution
Process as a Conceptual Pod
Defined by:
• Purpose and goal
• Stakeholders
• Expectations
• Key questions to be answered,
production/consumption relationships, needs,
dependencies, limits, …
• Contracts
• Performance, economic, accuracy, policy, privacy,
reproducibility, political, …
• Knowns
• Known unknowns
Metrics for accountability should be built into
the process.
Timeline
Purpose Expectations
Planning of deliverables
Cost
Using the PPODS Approach
• Each step in your data pipelines is a
separate pod
• Define success metrics for calling
each pod done
• Pods can be atomic or hierarchical
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Zooming into a simple example…
PREPARE ANALYZE
Data
Exploration
Schema
Integration
Query
Processing
Machine
Learning
…
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Creating A Solution Architecture for
Networked Science Applications
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
Process-driven
Solution
Architectures
and the Role of
Workflows
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
…
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
COMMUNICATION AND FEEDBACK
EXPLORATION
SCALABILITY
PROVENANCE
SECURITY
ACQUIRE PREPARE ANALYZE REPORT ACT
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Utilizing “Advanced Cyberinfrastructure”
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
Compute
+
Storage
+
Network
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego
Providing Cyberinfrastructure for Research and Education
• Established as a national supercomputer
resource center in 1985 by NSF
• A world leader in HPC, data-intensive computing,
and scientific data management
• Current strategic focus on “Big Data”, “versatile
computing”, and “life sciences applications”
Recent Innovative Architectures
• Gordon: First Flash-based
Supercomputer for Data-intensive
Apps
• Comet: Serving the Long Tail of
Science
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-Pis:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Disk-to-Disk: 10-100 Gbps
Source: John Hess, CENIC
Larry Smarr, UCSD
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data Slide Source: Larry Smarr, UCSD
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Next Step: Surrounding the CHASE-CI Machine Learning Platform
With Clouds of GPUs and Non Von Neumann Processors
Microsoft Installs FPGAs into Bing Servers &
432 into TAAC for Academic Access
64-TrueNorth
Cluster
CHASE-CI
Slide Source: Larry Smarr, UCSD
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
WORKFLOW MANAGEMENT
Application Integration, Coordination, Optimization,
Communication, Reporting
COMPOSABLE DATA SERVICES
Deep Learning, Analytics, HPC, Training, Notebooks
COMPOSABLE SYSTEMS
GPU, CPU, Big Data, Neuromorphic, Networks, Storage, …
PROVENANCE
SECURITY
RESOURCE MANAGEMENT
Kubernetes Container Cloud
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
COORDINATION AND WORKFLOW MANAGEMENT
…
http://kepler-project.org
National
Resources
(Gordon) (Comet)
(Stampede)(Lonestar)
Cloud
Resources
Execution Platforms
Local Cluster Resources
ACQUIRE PREPARE ANALYZE REPORT ACT
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Dynamic data-driven coordination
& resource optimization
Requires:
Ability to explore and scale on
multiple platforms
Dynamic operations research for science using workflows.
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
SOLUTION ARCHITECTURE
DOMAIN KNOWLEDGE
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Parts of the Solution
• Stakeholders
• Datasets
• Compliance requirements
• Defined actions
• Analytical methods
• Technical infrastructure
Bias
Transparency
Verification
Accuracy
Ethics
Reproducibility
Cost
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
To summarize…
• Data science is a collaborative activity
• Focus on collaboration and communication from problem definition stage
• Apply process management techniques where necessary
• Incorporate and formalize definition of success from different perspectives
• Measurable automation should be the end goal
• Requires built in programmable and scalable data pipelines
• Includes measurable and programmable networks
• Iterations based on pre-defined metrics help
• PPODS is a methodology for collaborative data science application
integration and iteration
• Toolkits for process automation, scalable execution, provenance tracking and
reporting
CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Contact: Ilkay Altintas, Ph.D.
Email: ialtintas@ucsd.edu
Questions?
PartsofthepresentedworkisfundedbyNSF,DOE,
NIH,UCSanDiegoandvariousindustrypartners.

Weitere ähnliche Inhalte

Was ist angesagt?

Ryan C Goode. General Resume (3) (1)
Ryan C Goode. General Resume (3) (1)Ryan C Goode. General Resume (3) (1)
Ryan C Goode. General Resume (3) (1)Ryan Goode
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceWesley Eldridge
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data ScienceFeyzi R. Bagirov
 
Grid Computing in a Commodity World (KCCMG, 2005)
Grid Computing in a Commodity World (KCCMG, 2005)Grid Computing in a Commodity World (KCCMG, 2005)
Grid Computing in a Commodity World (KCCMG, 2005)Lorin Olsen
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Gregory Piatetsky-Shapiro
 
Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018NVIDIA
 
Advancing Medical Imaging with Deep Learning
Advancing Medical Imaging with Deep LearningAdvancing Medical Imaging with Deep Learning
Advancing Medical Imaging with Deep LearningNVIDIA
 
Robotics: Current Topics
Robotics: Current TopicsRobotics: Current Topics
Robotics: Current TopicsSabbir Ahmmed
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017NVIDIA
 
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit
 
Machine Learning with Ayasdi
Machine Learning with AyasdiMachine Learning with Ayasdi
Machine Learning with AyasdiAyasdi
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
 
Top 5 Deep Learning and AI Stories - September 14, 2018
Top 5 Deep Learning and AI Stories - September 14, 2018Top 5 Deep Learning and AI Stories - September 14, 2018
Top 5 Deep Learning and AI Stories - September 14, 2018NVIDIA
 
Transforming Operations Using the Results of the Tech Wave
Transforming Operations Using the Results of the Tech WaveTransforming Operations Using the Results of the Tech Wave
Transforming Operations Using the Results of the Tech WaveDavid Blankinship
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyNVIDIA
 
Towards a better measure of business proximity: Topic modeling for industry i...
Towards a better measure of business proximity: Topic modeling for industry i...Towards a better measure of business proximity: Topic modeling for industry i...
Towards a better measure of business proximity: Topic modeling for industry i...Gene Moo Lee
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analyticsAtilla Elçi
 
Petroleum Data Analytics webinar
Petroleum Data Analytics webinarPetroleum Data Analytics webinar
Petroleum Data Analytics webinarPetroTeach1
 

Was ist angesagt? (20)

Ryan C Goode. General Resume (3) (1)
Ryan C Goode. General Resume (3) (1)Ryan C Goode. General Resume (3) (1)
Ryan C Goode. General Resume (3) (1)
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data Science
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data Science
 
Grid Computing in a Commodity World (KCCMG, 2005)
Grid Computing in a Commodity World (KCCMG, 2005)Grid Computing in a Commodity World (KCCMG, 2005)
Grid Computing in a Commodity World (KCCMG, 2005)
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018Top 5 Deep Learning and AI Stories - November 30, 2018
Top 5 Deep Learning and AI Stories - November 30, 2018
 
Advancing Medical Imaging with Deep Learning
Advancing Medical Imaging with Deep LearningAdvancing Medical Imaging with Deep Learning
Advancing Medical Imaging with Deep Learning
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
 
Robotics: Current Topics
Robotics: Current TopicsRobotics: Current Topics
Robotics: Current Topics
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017
 
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...Accumulo Summit 2014: Addressing big data challenges through innovative archi...
Accumulo Summit 2014: Addressing big data challenges through innovative archi...
 
Machine Learning with Ayasdi
Machine Learning with AyasdiMachine Learning with Ayasdi
Machine Learning with Ayasdi
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
 
Top 5 Deep Learning and AI Stories - September 14, 2018
Top 5 Deep Learning and AI Stories - September 14, 2018Top 5 Deep Learning and AI Stories - September 14, 2018
Top 5 Deep Learning and AI Stories - September 14, 2018
 
Transforming Operations Using the Results of the Tech Wave
Transforming Operations Using the Results of the Tech WaveTransforming Operations Using the Results of the Tech Wave
Transforming Operations Using the Results of the Tech Wave
 
Transforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon ValleyTransforming Healthcare at GTC Silicon Valley
Transforming Healthcare at GTC Silicon Valley
 
Towards a better measure of business proximity: Topic modeling for industry i...
Towards a better measure of business proximity: Topic modeling for industry i...Towards a better measure of business proximity: Topic modeling for industry i...
Towards a better measure of business proximity: Topic modeling for industry i...
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Petroleum Data Analytics webinar
Petroleum Data Analytics webinarPetroleum Data Analytics webinar
Petroleum Data Analytics webinar
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 

Ähnlich wie Collaborative Data Science In A Highly Networked World

Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?Ling-Jyh Chen
 
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 MonitoringLing-Jyh Chen
 
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT Security
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT SecurityDr Alisdair Ritchie | Research: The Answer to the Problem of IoT Security
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT SecurityPro Mrkt
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
20170410 CENTRA2 meeting - AirBox
20170410 CENTRA2 meeting - AirBox20170410 CENTRA2 meeting - AirBox
20170410 CENTRA2 meeting - AirBoxLing-Jyh Chen
 
Digitalisation and the future of research environments
Digitalisation and the future of research environmentsDigitalisation and the future of research environments
Digitalisation and the future of research environmentsJisc
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinLiming Zhu
 
SmartAmerica / Global City Teams Challenge
SmartAmerica / Global City Teams ChallengeSmartAmerica / Global City Teams Challenge
SmartAmerica / Global City Teams ChallengeInternet of Things DC
 
e-SIDES workshop at ICT 2018, Vienna 5/12/2018
e-SIDES workshop at ICT 2018, Vienna 5/12/2018e-SIDES workshop at ICT 2018, Vienna 5/12/2018
e-SIDES workshop at ICT 2018, Vienna 5/12/2018e-SIDES.eu
 
NUS-ISS Learning Day 2018-Painting Today's digital landscape
NUS-ISS Learning Day 2018-Painting Today's digital landscapeNUS-ISS Learning Day 2018-Painting Today's digital landscape
NUS-ISS Learning Day 2018-Painting Today's digital landscapeNUS-ISS
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
IoT : Research, Development, Challenges
IoT: Research, Development, ChallengesIoT: Research, Development, Challenges
IoT : Research, Development, Challengesbaddi youssef
 
Lecture1_Introduction.pptx
Lecture1_Introduction.pptxLecture1_Introduction.pptx
Lecture1_Introduction.pptxishwar69
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsPayamBarnaghi
 
Internet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_finalInternet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_finalAnastasios Economides
 

Ähnlich wie Collaborative Data Science In A Highly Networked World (20)

Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?
 
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring
[2017/07/11 AI-SOCD] AirBox: a Participatory Ecosystem for PM2.5 Monitoring
 
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT Security
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT SecurityDr Alisdair Ritchie | Research: The Answer to the Problem of IoT Security
Dr Alisdair Ritchie | Research: The Answer to the Problem of IoT Security
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
20170410 CENTRA2 meeting - AirBox
20170410 CENTRA2 meeting - AirBox20170410 CENTRA2 meeting - AirBox
20170410 CENTRA2 meeting - AirBox
 
Digitalisation and the future of research environments
Digitalisation and the future of research environmentsDigitalisation and the future of research environments
Digitalisation and the future of research environments
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital Twin
 
SmartAmerica / Global City Teams Challenge
SmartAmerica / Global City Teams ChallengeSmartAmerica / Global City Teams Challenge
SmartAmerica / Global City Teams Challenge
 
e-SIDES workshop at ICT 2018, Vienna 5/12/2018
e-SIDES workshop at ICT 2018, Vienna 5/12/2018e-SIDES workshop at ICT 2018, Vienna 5/12/2018
e-SIDES workshop at ICT 2018, Vienna 5/12/2018
 
NUS-ISS Learning Day 2018-Painting Today's digital landscape
NUS-ISS Learning Day 2018-Painting Today's digital landscapeNUS-ISS Learning Day 2018-Painting Today's digital landscape
NUS-ISS Learning Day 2018-Painting Today's digital landscape
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
IoT : Research, Development, Challenges
IoT: Research, Development, ChallengesIoT: Research, Development, Challenges
IoT : Research, Development, Challenges
 
Lecture1_Introduction.pptx
Lecture1_Introduction.pptxLecture1_Introduction.pptx
Lecture1_Introduction.pptx
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
Internet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_finalInternet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_final
 

Mehr von Ilkay Altintas, Ph.D.

Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraIlkay Altintas, Ph.D.
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
Using Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceUsing Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceIlkay Altintas, Ph.D.
 
Using Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceUsing Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceIlkay Altintas, Ph.D.
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesIlkay Altintas, Ph.D.
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona Ilkay Altintas, Ph.D.
 

Mehr von Ilkay Altintas, Ph.D. (8)

Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Using Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceUsing Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire Resilience
 
Using Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire ResilienceUsing Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire Resilience
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing ArchitecturesWorDS of Data Science in the Presence of Heterogenous Computing Architectures
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona
 

Kürzlich hochgeladen

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Kürzlich hochgeladen (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Collaborative Data Science In A Highly Networked World

  • 1. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Collaborative Data Science in a Highly Networked World İlkay ALTINTAŞ, Ph.D. Chief Data Science Officer, San Diego Supercomputer Center Division Director, Cyberinfrastructure Research, Education and Development Founder and Director, Workflows for Data Science Center of Excellence
  • 2. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) What is a network useful for? ?
  • 3. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Making connections • People and communities • Data and applications • People and information • People and services • Learners and classes • Ideas and masses
  • 4. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Advancing Communication and Collaboration
  • 5. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Any technology and application built on networking should be built around these concepts.
  • 6. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How do we conduct and teach data science in a highly networked world? ?
  • 7. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) What is Data Science? ?
  • 8. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Ultimate Goal BigData Insight Action Data Science
  • 9. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How does successful data science happen? Insight Data Product “Big” Data Question Exploratory Analysis and Modeling Insight
  • 10. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Customer Demographic Previous Purchases Book reviews What kind of books does this customer like? Book recommendations Example: Book Recommendations
  • 11. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Model of customer’s book preferences New book information Who is likely to like this book? Find Potential Audience for a New Book
  • 12. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Action to market the book to the right audience Who is likely to like this book? Market a New Book
  • 13. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Action to market the book to the right audience Who is likely to like this book? Insight Action Market a New Book
  • 14. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Historical data Near real-time data Prediction Creating Actionable Information
  • 15. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Prediction Action
  • 16. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Why is the increased interest in Data Science? ?
  • 17. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) + Big Data Scalable Computing Anywhere Anytime
  • 18. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Data Science Today is Both a Big Data and a Big Compute Discipline BIG DATA COMPUTING AT SCALE Enables dynamic data-driven applications Smart Manufacturing Computer-Aided Drug Discovery Personalized Precision Medicine Smart Cities Smart Grid and Energy Management Disaster Resilience and Response Requires: • Data management • Data-driven methods • Scalable & dynamic process coordination • Resource optimization • Skilled interdisciplinary workforce New era of data science!
  • 19. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Nearly every problem today is transformed by big data.
  • 20. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Example: Geospatial Big Data • Flood of new data sources and types • Needs new data management, storage and analysis methods • Too big for a single server, fast growing data volume • Requires special database structures that can handle data variety • Too continuous for analysis at a later time, with increasing streaming rate, i.e., velocity • Varying degrees of uncertainty in measurements, and other veracity issues • Provides opportunities for scientific understanding at different scales more than ever, i.e., potential high value Real-time sensors Weather forecast Satellite imagery Sea Surface Temperature Measurements Drone imagery
  • 21. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Example: Biomedical Big Data http://nbcr.ucsd.edu
  • 22. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) 1021
  • 23. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How do we amplify the value of Big Data?
  • 24. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How do we find the connections and answer questions that benefit the society? “We are drowning in information and starving for knowledge” – John Naisbitt Source: Megatrends, 1982
  • 25. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Create an Ecosystem that Enables Needs and Best Practices • data-driven • scalable • dynamic • process-driven • collaborative • accountable • reproducible • interactive • heterogeneous • includes many different expertise
  • 26. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) What would such an ecosystem look like? ?
  • 27. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) D ata M anagem ent Advanced Infrastructure D ata Analytics C om putational Science A Typical Collaborative Data Science Ecosystem
  • 28. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Amplifying the Value of Data Related to X Benefit Y for Science, Business, Society or Education What if X was wildfires?
  • 29. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Collaborative Networked Science for Wildfires
  • 30. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How do we Better Predict Wildfire Behavior? • Wildfires are critical for ecology, but volatile • Fuel load is high due to fire suppression over the last century • Drought, higher temperatures • Better prevention, prediction and maintenance of wildfires is needed Photo of Harris Fire (2007) by former Fire Captain Bill Clayton Disaster management of (ongoing) wildfires heavily relies on understanding their Direction and Rate of Spread (RoS). Fire is Part of the Natural Ecology…. … but requires Monitoring, Prediction and Resilience
  • 31. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Big Data Fire Modeling Visualization Monitoring WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires
  • 32. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) A dynamic system integration of real-time sensor networks, satellite imagery, near-real time data management tools, wildfire simulation tools, and connectivity to emergency command centers . …. before, during and after a firestorm.
  • 33. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Videoavailableat: https://www.youtube.com/watch?v=N4LAROiW5c8&t=2s
  • 34. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) High Performance Wireless Research and Education Network FARSITE http://hpwren.ucsd.edu/cameras >160 Meteorological Sensors and Growing Major success to bring internet to incident command in the field. Used in over 20 fires over time. Most popular operational fire behavior modeling system.
  • 35. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Closing the Loop using Big Data -- Wildfire Behavior Modeling and Data Assimilation -- • Computational costs for existing models too high for real-time analysis • a priori -> a posteriori • Parameter estimation to make adjustments to the (input) parameters • State estimation to adjust the simulated fire front location with an a posteriori update/measurement of the actual fire front locationConceptual Data Assimilation Workflow with Prediction and Update Steps using Sensor Data
  • 36. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Fire Modeling Workflows in WIFIRE Real-time sensors Weather forecast Fire perimeter Landscape data Monitoring & fire mapping
  • 37. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Firemap Tool • A web-based GIS environment: • access information related to fire behavior • analyze what-if scenarios • model real-time fire behavior • generate reports • Powered by WIFIRE Firemap Web Interface WIFIRE Data Interfaces WIFIRE Workflows Computing Infrastructure http://firemap.sdsc.edu
  • 38. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Data-Driven Fire Progression Prediction Over Three Hours Collaboration with LA and SD Fire Departments http://firemap.sdsc.edu August 2016 – Blue Cut Fire Tahoe and Nevada Bureau of Land Management Cameras: 20 cameras added with field-of-view
  • 39. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) CA Fires 10/2017 through 12/2017 800K+ unique visitors and 8M+ hits http://firemap.sdsc.edu
  • 40. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) San Diego Airborne Intelligence Reconnaissance System (SDAIRS) Lilac Fire Perimeter and WIFIRE Fire Progression Model in SCOUT
  • 41. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Thomas Fire: 12/04/2017- 01/12/2018 December 10, 2017 December 17, 2017
  • 42. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Real-time Satellite Detections During Thomas Fire: 12/04/2017- 01/12/2018
  • 43. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Some Machine Learning Case Studies • Smoke and fire perimeter detection based on imagery • Prediction of Santa Ana and fire conditions specific to location • Prediction of fuel build up based on fire and weather history • NLP for understanding local conditions based on radio communications • Deep learning on multi-spectra imagery for high resolution fuel maps • Classification project to generate more accurate fuel maps (using Planet Labs satellite data) All require periodic, dynamic and programmatic access to data!
  • 44. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Classification project to generate more accurate fuel maps • Accurate and up-to-date fuel maps are critical for modeling wildfire rate of speed and potential burn areas. • Challenge: • USGS Landfire provides the best available fuel maps every two years. • The WIFIRE system is limited by these potentially 2-year old inputs. Fuel maps created at a higher temporal frequency is desired. • Approach: • Using high-resolution satellite imagery and deep learning methods, produce surface fuel maps of San Diego County and other regions in Southern California. • Use LandFire fuel maps as the target variable, the objective is create a classification model that will provide fuel maps at greater frequency with a measure of uncertainty. Cluster 1: Short Grass
  • 45. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) WIFIRE Team: It takes a village! • PhD level researchers • Professional software developers • 29 undergraduate students • UC San Diego • UC Merced • MURPA University • University of Queensland • 1 high school student • 5 MSc and 5 MAS students • 2 PhD students (UMD) • 1 postdoctoral researcher • Partners from fire departments • Advisory board with diverse expertise and affiliations UMD - Fire modeling UCSD MAE - Data assimilation SDSC - Cyberinfrastructure, Workflows, Data engineering, Machine Learning, Information Visualization, HPWREN Calit2/QI- Cyberinfrastructure, GIS, Advanced Visualization, Machine Learning, Urban Sustainability, HPWREN SIO - HPWREN
  • 46. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) ACQUIRE PREPARE ANALYZE REPORT ACT Focus on the Process and Team Work to Answer a Question …
  • 47. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Scalable Drug Discovery medium Prima-1 Sticticacid 35ZWF 25KKL 22LSV 32CTM 26RQZ 27WT9 33AG6 33BAZ 28NZ6 27TGR 27VFS 35LWZ 36EB5 27UDP 32LDE 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 no p53 0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 1.4" no com poundPrim a-1 35ZW F25KKL25PW S24M LP26YYG22LSV24M NR32CTM 22KTV24M Y424LBC24NPU24NW 3 Series1" Series2" 0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 1.4" no com poundPrim a-1 35ZW F25KKL25PW S24M LP26YYG22LSV24M NR32CTM 22KTV24M Y424LBC24NPU24NW 3 Series1" Series2"cancer cell with p53-R175H mutant cellproliferation 15 new reactivation compounds reactivation compounds kill cells with p53 cancer mutant Ieong et al., 2014 AMBER GPU MD Tool Minimization Actor BENEFITS: • Increase reuse • Reproducibility • Scale execution, problem & solution • Compare methods • Train students
  • 48. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Using workflows for process integration… D ata M anagem ent Advanced Infrastructure D ata Analytics C om putational Science
  • 49. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Networked Science Workflows – Early Examples –
  • 50. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) 2004, ROADNet Project 50 ORB
  • 51. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Real-time Stream Processing 2005, ROADNet Project Laser Strainmeter Channels in; Scientific Workflow; Earth-tide signal out Straightforward Example: Seismic Waveforms
  • 52. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Sample Variance Plotting and Storage Workflow for Real-time Data 2006, ROADNet Project
  • 53. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Workflows for Data Science Center of Excellence at SDSC Goal: Methodology and tool development to build automated and operational workflow-driven solution architectures on big data and HPC platforms. Focus on the question, not the technology! Real-Time Hazards Management wifire.ucsd.edu Data-Parallel Bioinformatics bioKepler.org Scalable Automated Molecular Dynamics and Drug Discovery nbcr.ucsd.edu WorDS.sdsc.edu • Access and query data • Support exploratory design • Scale computational analysis • Increase reuse and reproducibility • Save time, energy and money • Formalize and standardize • Train
  • 54. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Balance of: • team building • process management • performance optimization • provenance tracking • training and education
  • 55. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) While working with experts on… • domain expertise • data modeling and integration • data management services • analytical methods • communication and visualization
  • 56. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) “The” Data Science Team • Data engineer • Data analyst • Methods expert • Scalability and operations expert • Business manager • Business analyst • Scientist • Visualization and dashboard developer • Solution architect • Story teller/coordinator • Project manager Expertise and skills often overlap, but nobody has it all!
  • 57. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) How can I get smart people to collaborate and communicate? …to utilize data and infrastructure to generate insights and solve a question. Focus on the question, not the technology! Team Building
  • 58. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Purpose to Lead to Insight Focus on the question, not the technology! Purpose LEAN METHOD Minimize the total time through the loop CODE LEARN BUILD MEASURE DATA IDEAS ?
  • 59. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Data Science Process
  • 60. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) ACQUIRE PREPARE ANALYZE REPORT ACT Basic Steps in a Data Science Process • Import raw dataset into your analytics platform • Explore & Visualize • Perform Data Cleaning • Feature Selection • Model Selection • Analyze the results • Present your findings • Use them ACQUIRE PREPARE ANALYZE REPORT ACT
  • 61. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Computational Data ScienceData Engineering ACQUIRE PREPARE ANALYZE REPORT ACT Scale Scale Scale Scale Many iterations and rollbacks between steps.
  • 62. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Build Explore Scale Report Act
  • 63. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Computational Data ScienceData Engineering ACQUIRE PREPARE ANALYZE REPORT ACT Scale Scale Scale Scale Programmability
  • 64. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Process for Practice of Data Science Programmability Ease of use, iteration, interaction, re-use, re-purpose Scalability From local experiments to large-scale runs Reproducibility Ability to validate, re-run, re-play Data Product
  • 65. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Some P’s in PPoDS Platforms Process People Problem or Purpose ? Programmability
  • 66. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) The insights need to be evaluated to turn them into action. Platforms Process People Purpose? Programmability Metrics Product Insight Action ?
  • 67. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Pod è sub-process Treat Each Step in the Solution Process as a Conceptual Pod Defined by: • Purpose and goal • Stakeholders • Expectations • Key questions to be answered, production/consumption relationships, needs, dependencies, limits, … • Contracts • Performance, economic, accuracy, policy, privacy, reproducibility, political, … • Knowns • Known unknowns Metrics for accountability should be built into the process. Timeline Purpose Expectations Planning of deliverables Cost Using the PPODS Approach • Each step in your data pipelines is a separate pod • Define success metrics for calling each pod done • Pods can be atomic or hierarchical
  • 68. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Zooming into a simple example… PREPARE ANALYZE Data Exploration Schema Integration Query Processing Machine Learning …
  • 69. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Creating A Solution Architecture for Networked Science Applications
  • 70. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) COORDINATION AND WORKFLOW MANAGEMENT DATA INTEGRATION AND PROCESSING DATA MANAGEMENT AND STORAGE Process-driven Solution Architectures and the Role of Workflows
  • 71. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) … COORDINATION AND WORKFLOW MANAGEMENT DATA INTEGRATION AND PROCESSING DATA MANAGEMENT AND STORAGE COMMUNICATION AND FEEDBACK EXPLORATION SCALABILITY PROVENANCE SECURITY ACQUIRE PREPARE ANALYZE REPORT ACT
  • 72. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Utilizing “Advanced Cyberinfrastructure” D ata M anagem ent Advanced Infrastructure D ata Analytics C om putational Science Compute + Storage + Network
  • 73. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego Providing Cyberinfrastructure for Research and Education • Established as a national supercomputer resource center in 1985 by NSF • A world leader in HPC, data-intensive computing, and scientific data management • Current strategic focus on “Big Data”, “versatile computing”, and “life sciences applications” Recent Innovative Architectures • Gordon: First Flash-based Supercomputer for Data-intensive Apps • Comet: Serving the Long Tail of Science
  • 74. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Superhighway” System Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-Pis: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Disk-to-Disk: 10-100 Gbps Source: John Hess, CENIC Larry Smarr, UCSD
  • 75. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure Adding a Machine Learning Layer Built on Top of the Pacific Research Platform Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for High Speed “Cloud” of 256 GPUs For 30 ML Faculty & Their Students at 10 Campuses for Training AI Algorithms on Big Data Slide Source: Larry Smarr, UCSD
  • 76. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Next Step: Surrounding the CHASE-CI Machine Learning Platform With Clouds of GPUs and Non Von Neumann Processors Microsoft Installs FPGAs into Bing Servers & 432 into TAAC for Academic Access 64-TrueNorth Cluster CHASE-CI Slide Source: Larry Smarr, UCSD
  • 77. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) WORKFLOW MANAGEMENT Application Integration, Coordination, Optimization, Communication, Reporting COMPOSABLE DATA SERVICES Deep Learning, Analytics, HPC, Training, Notebooks COMPOSABLE SYSTEMS GPU, CPU, Big Data, Neuromorphic, Networks, Storage, … PROVENANCE SECURITY RESOURCE MANAGEMENT Kubernetes Container Cloud
  • 78. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) COORDINATION AND WORKFLOW MANAGEMENT … http://kepler-project.org National Resources (Gordon) (Comet) (Stampede)(Lonestar) Cloud Resources Execution Platforms Local Cluster Resources ACQUIRE PREPARE ANALYZE REPORT ACT
  • 79. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Dynamic data-driven coordination & resource optimization Requires: Ability to explore and scale on multiple platforms Dynamic operations research for science using workflows.
  • 80. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
  • 81. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) SOLUTION ARCHITECTURE DOMAIN KNOWLEDGE
  • 82. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Parts of the Solution • Stakeholders • Datasets • Compliance requirements • Defined actions • Analytical methods • Technical infrastructure Bias Transparency Verification Accuracy Ethics Reproducibility Cost
  • 83. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) To summarize… • Data science is a collaborative activity • Focus on collaboration and communication from problem definition stage • Apply process management techniques where necessary • Incorporate and formalize definition of success from different perspectives • Measurable automation should be the end goal • Requires built in programmable and scalable data pipelines • Includes measurable and programmable networks • Iterations based on pre-defined metrics help • PPODS is a methodology for collaborative data science application integration and iteration • Toolkits for process automation, scalable execution, provenance tracking and reporting
  • 84. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu ) Contact: Ilkay Altintas, Ph.D. Email: ialtintas@ucsd.edu Questions? PartsofthepresentedworkisfundedbyNSF,DOE, NIH,UCSanDiegoandvariousindustrypartners.