Present: Our lives, as well as any field of business and society, are continuously transformed by our ability to collect meaningful data in a systematic fashion and turn that into value. We are increasingly more connected to data sources, have unprecedented distributed infrastructure capabilities and continuously improve our scientific and analytical capabilities. A new interest in an evolved field of data science has emerged as a response to the push from these advances.
Potential: The state of the art and present challenges come with many opportunities. They not only push for new and innovative capabilities in composable data management and analytical methods that can run anytime, anywhere but also require methods to bridge the gap between applications and such capabilities. However, we often lack collaborative culture and effective methodologies to translate these newest advances into impactful solution architectures that can transform science, society, and education.
Future: A Collaborative Networked World as a Part of the Data Science Process: Any solution architecture for data science today depends on the effectivity of a multi-disciplinary data science team, not only with humans but also with analytical systems and infrastructure which are inter-related parts of the solution. Focusing on collaboration and communication between people, and dynamic, predictable and programmable interfaces to systems and scalable infrastructure from the beginning of any activity is critical. This talk will provide an overview of some of our recent work on networked application architectures for dynamic data-driven wildfire modeling and smart cities. It will also explain how focusing on (1) some P’s in the planning phases of a data science activity and (2) creating a measurable process that spans multiple perspectives and success metrics was effective in making these solutions scalable. Lastly, it will introduce the PPODS methodology and family of composable tools for a team-based data science process management and training.
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Collaborative Data Science In A Highly Networked World
1. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Collaborative Data Science in a Highly Networked World
İlkay ALTINTAŞ, Ph.D.
Chief Data Science Officer, San Diego Supercomputer Center
Division Director, Cyberinfrastructure Research, Education and Development
Founder and Director, Workflows for Data Science Center of Excellence
2. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What is a network
useful for?
?
3. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Making connections
• People and communities
• Data and applications
• People and information
• People and services
• Learners and classes
• Ideas and masses
4. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Advancing
Communication
and
Collaboration
5. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Any technology and
application built on
networking should be built
around these concepts.
6. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we conduct and
teach data science in a
highly networked world?
?
7. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What is Data Science?
?
9. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How does successful data science happen?
Insight Data Product
“Big” Data
Question
Exploratory
Analysis
and
Modeling
Insight
10. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Customer
Demographic
Previous
Purchases
Book reviews
What kind of
books does this
customer like?
Book
recommendations
Example: Book Recommendations
11. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Model of customer’s
book preferences
New book
information
Who is likely to
like this book?
Find Potential Audience for a New Book
12. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Action to market
the book to the
right audience
Who is likely to
like this book?
Market a New Book
13. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Action to market
the book to the
right audience
Who is likely to
like this book?
Insight Action
Market a New Book
14. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Historical data Near real-time data
Prediction
Creating Actionable Information
16. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Why is the increased interest
in Data Science?
?
17. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
+
Big Data
Scalable Computing
Anywhere Anytime
18. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data Science Today is Both a Big Data and a Big Compute Discipline
BIG DATA
COMPUTING AT
SCALE
Enables dynamic data-driven applications
Smart Manufacturing
Computer-Aided Drug Discovery
Personalized Precision Medicine
Smart Cities
Smart Grid and Energy Management
Disaster Resilience and Response
Requires:
• Data management
• Data-driven methods
• Scalable & dynamic
process coordination
• Resource optimization
• Skilled interdisciplinary
workforce
New era of
data science!
19. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Nearly every problem today is
transformed by big data.
20. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Example: Geospatial Big Data
• Flood of new data sources and types
• Needs new data management, storage and analysis
methods
• Too big for a single server, fast growing data volume
• Requires special database structures that can handle
data variety
• Too continuous for analysis at a later time, with
increasing streaming rate, i.e., velocity
• Varying degrees of uncertainty in measurements, and
other veracity issues
• Provides opportunities for scientific understanding at
different scales more than ever, i.e., potential high value
Real-time sensors
Weather forecast
Satellite imagery
Sea Surface Temperature
Measurements
Drone imagery
21. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Example: Biomedical Big Data http://nbcr.ucsd.edu
23. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we amplify the value of Big Data?
24. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we find the connections
and answer questions that
benefit the society?
“We are drowning in
information and
starving for knowledge”
– John Naisbitt
Source: Megatrends, 1982
25. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Create an Ecosystem that Enables
Needs and Best Practices
• data-driven
• scalable
• dynamic
• process-driven
• collaborative
• accountable
• reproducible
• interactive
• heterogeneous
• includes many different expertise
26. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
What would such an
ecosystem look like?
?
27. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
A Typical Collaborative Data Science Ecosystem
28. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Amplifying the
Value of Data
Related to X
Benefit Y for
Science,
Business,
Society or
Education
What if X was wildfires?
30. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How do we Better Predict Wildfire Behavior?
• Wildfires are critical for ecology, but volatile
• Fuel load is high due to fire suppression over the
last century
• Drought, higher temperatures
• Better prevention, prediction and maintenance of
wildfires is needed
Photo of Harris Fire (2007) by former Fire Captain Bill
Clayton
Disaster management of (ongoing) wildfires heavily relies on
understanding their Direction and Rate of Spread (RoS).
Fire is Part of the Natural Ecology….
… but requires Monitoring, Prediction and Resilience
31. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Big Data Fire Modeling
Visualization
Monitoring
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic
Prediction and Resilience Cyberinfrastructure for Wildfires
32. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
A dynamic system integration of
real-time sensor networks, satellite imagery, near-real
time data management tools, wildfire simulation tools,
and connectivity to emergency command centers
.
…. before, during and after a firestorm.
34. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
High Performance Wireless
Research and Education Network FARSITE
http://hpwren.ucsd.edu/cameras
>160 Meteorological Sensors and Growing
Major success to bring
internet to incident
command in the field. Used
in over 20 fires over time.
Most popular
operational fire
behavior
modeling system.
35. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Closing the Loop using Big Data
-- Wildfire Behavior Modeling and Data Assimilation --
• Computational costs for existing
models too high for real-time
analysis
• a priori -> a posteriori
• Parameter estimation to make
adjustments to the (input) parameters
• State estimation to adjust the
simulated fire front location with an a
posteriori update/measurement of the
actual fire front locationConceptual Data Assimilation Workflow with
Prediction and Update Steps using Sensor Data
36. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Fire Modeling Workflows in WIFIRE
Real-time sensors
Weather forecast
Fire perimeter
Landscape data
Monitoring &
fire mapping
37. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Firemap Tool
• A web-based GIS environment:
• access information related to fire
behavior
• analyze what-if scenarios
• model real-time fire behavior
• generate reports
• Powered by WIFIRE
Firemap
Web Interface
WIFIRE Data Interfaces WIFIRE Workflows
Computing Infrastructure
http://firemap.sdsc.edu
38. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data-Driven Fire Progression
Prediction Over Three Hours
Collaboration with LA and
SD Fire Departments
http://firemap.sdsc.edu
August 2016 – Blue Cut Fire
Tahoe and Nevada Bureau
of Land Management
Cameras: 20 cameras added
with field-of-view
39. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
CA Fires 10/2017 through 12/2017
800K+ unique visitors and 8M+ hits
http://firemap.sdsc.edu
40. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
San Diego Airborne Intelligence
Reconnaissance System (SDAIRS)
Lilac Fire Perimeter and
WIFIRE Fire Progression
Model in SCOUT
41. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Thomas Fire: 12/04/2017- 01/12/2018
December 10, 2017
December 17, 2017
42. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Real-time Satellite Detections During
Thomas Fire: 12/04/2017- 01/12/2018
43. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Some Machine Learning Case Studies
• Smoke and fire perimeter detection based on imagery
• Prediction of Santa Ana and fire conditions specific to location
• Prediction of fuel build up based on fire and weather history
• NLP for understanding local conditions based on radio
communications
• Deep learning on multi-spectra imagery for high resolution fuel maps
• Classification project to generate more accurate fuel maps (using
Planet Labs satellite data)
All require periodic,
dynamic and
programmatic
access to data!
44. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Classification project to generate more
accurate fuel maps
• Accurate and up-to-date fuel maps are critical for
modeling wildfire rate of speed and potential burn
areas.
• Challenge:
• USGS Landfire provides the best available fuel maps
every two years.
• The WIFIRE system is limited by these potentially 2-year
old inputs. Fuel maps created at a higher temporal
frequency is desired.
• Approach:
• Using high-resolution satellite imagery and deep
learning methods, produce surface fuel maps of San
Diego County and other regions in Southern California.
• Use LandFire fuel maps as the target variable, the
objective is create a classification model that will
provide fuel maps at greater frequency with a measure
of uncertainty.
Cluster 1: Short Grass
45. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
WIFIRE Team: It takes a village!
• PhD level researchers
• Professional software
developers
• 29 undergraduate students
• UC San Diego
• UC Merced
• MURPA University
• University of Queensland
• 1 high school student
• 5 MSc and 5 MAS students
• 2 PhD students (UMD)
• 1 postdoctoral researcher
• Partners from fire departments
• Advisory board with diverse
expertise and affiliations
UMD - Fire modeling
UCSD MAE - Data assimilation
SDSC -
Cyberinfrastructure,
Workflows,
Data engineering,
Machine Learning,
Information Visualization,
HPWREN
Calit2/QI-
Cyberinfrastructure, GIS,
Advanced Visualization,
Machine Learning,
Urban Sustainability,
HPWREN
SIO - HPWREN
46. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
ACQUIRE PREPARE ANALYZE REPORT ACT
Focus on the Process and Team Work
to Answer a Question
…
47. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Scalable Drug Discovery
medium
Prima-1
Sticticacid
35ZWF
25KKL
22LSV
32CTM
26RQZ
27WT9
33AG6
33BAZ
28NZ6
27TGR
27VFS
35LWZ
36EB5
27UDP
32LDE
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
no p53
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
1.4"
no
com
poundPrim
a-1
35ZW
F25KKL25PW
S24M
LP26YYG22LSV24M
NR32CTM
22KTV24M
Y424LBC24NPU24NW
3
Series1"
Series2"
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
1.4"
no
com
poundPrim
a-1
35ZW
F25KKL25PW
S24M
LP26YYG22LSV24M
NR32CTM
22KTV24M
Y424LBC24NPU24NW
3
Series1"
Series2"cancer cell with p53-R175H mutant
cellproliferation
15 new reactivation compounds
reactivation
compounds kill
cells with p53
cancer mutant
Ieong et al., 2014
AMBER GPU
MD Tool
Minimization Actor
BENEFITS:
• Increase reuse
• Reproducibility
• Scale execution,
problem & solution
• Compare methods
• Train students
48. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Using workflows for process integration…
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
52. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Sample Variance Plotting and Storage
Workflow for Real-time Data
2006,
ROADNet
Project
53. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Workflows for Data Science
Center of Excellence at SDSC
Goal: Methodology and tool
development to build automated
and operational workflow-driven
solution architectures on big data
and HPC platforms.
Focus on the
question,
not the
technology!
Real-Time Hazards Management
wifire.ucsd.edu
Data-Parallel Bioinformatics
bioKepler.org
Scalable Automated Molecular Dynamics and Drug Discovery
nbcr.ucsd.edu
WorDS.sdsc.edu
• Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse and reproducibility
• Save time, energy and money
• Formalize and standardize
• Train
54. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Balance of:
• team building
• process management
• performance optimization
• provenance tracking
• training and education
55. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
While working with experts on…
• domain expertise
• data modeling and integration
• data management services
• analytical methods
• communication and visualization
56. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
“The” Data Science Team
• Data engineer
• Data analyst
• Methods expert
• Scalability and operations expert
• Business manager
• Business analyst
• Scientist
• Visualization and dashboard developer
• Solution architect
• Story teller/coordinator
• Project manager
Expertise and skills often overlap,
but nobody has it all!
57. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
How can I get smart people
to collaborate and
communicate?
…to utilize data and infrastructure to
generate insights and solve a question.
Focus on the
question,
not the
technology!
Team Building
58. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Purpose to Lead to Insight
Focus on the
question,
not the
technology!
Purpose
LEAN METHOD
Minimize the
total time through the loop
CODE
LEARN BUILD
MEASURE
DATA
IDEAS
?
59. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Data Science Process
60. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
ACQUIRE PREPARE ANALYZE REPORT ACT
Basic Steps
in a Data
Science
Process
• Import raw dataset into your analytics
platform
• Explore & Visualize
• Perform Data Cleaning
• Feature Selection
• Model Selection
• Analyze the results
• Present your findings
• Use them
ACQUIRE
PREPARE
ANALYZE
REPORT
ACT
61. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Computational Data ScienceData Engineering
ACQUIRE PREPARE ANALYZE REPORT ACT
Scale Scale Scale Scale
Many iterations and rollbacks between steps.
64. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Process for Practice of
Data Science
Programmability
Ease of use, iteration, interaction, re-use, re-purpose
Scalability
From local experiments to large-scale runs
Reproducibility
Ability to validate, re-run, re-play
Data
Product
65. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Some P’s in PPoDS
Platforms
Process
People
Problem
or
Purpose
?
Programmability
66. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
The insights need to be evaluated to
turn them into action.
Platforms
Process
People
Purpose?
Programmability
Metrics Product
Insight
Action
?
67. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Pod è sub-process
Treat Each Step in the Solution
Process as a Conceptual Pod
Defined by:
• Purpose and goal
• Stakeholders
• Expectations
• Key questions to be answered,
production/consumption relationships, needs,
dependencies, limits, …
• Contracts
• Performance, economic, accuracy, policy, privacy,
reproducibility, political, …
• Knowns
• Known unknowns
Metrics for accountability should be built into
the process.
Timeline
Purpose Expectations
Planning of deliverables
Cost
Using the PPODS Approach
• Each step in your data pipelines is a
separate pod
• Define success metrics for calling
each pod done
• Pods can be atomic or hierarchical
68. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Zooming into a simple example…
PREPARE ANALYZE
Data
Exploration
Schema
Integration
Query
Processing
Machine
Learning
…
69. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Creating A Solution Architecture for
Networked Science Applications
70. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
Process-driven
Solution
Architectures
and the Role of
Workflows
71. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
…
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
COMMUNICATION AND FEEDBACK
EXPLORATION
SCALABILITY
PROVENANCE
SECURITY
ACQUIRE PREPARE ANALYZE REPORT ACT
72. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Utilizing “Advanced Cyberinfrastructure”
D
ata
M
anagem
ent
Advanced
Infrastructure
D
ata
Analytics
C
om
putational
Science
Compute
+
Storage
+
Network
73. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego
Providing Cyberinfrastructure for Research and Education
• Established as a national supercomputer
resource center in 1985 by NSF
• A world leader in HPC, data-intensive computing,
and scientific data management
• Current strategic focus on “Big Data”, “versatile
computing”, and “life sciences applications”
Recent Innovative Architectures
• Gordon: First Flash-based
Supercomputer for Data-intensive
Apps
• Comet: Serving the Long Tail of
Science
74. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-Pis:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Disk-to-Disk: 10-100 Gbps
Source: John Hess, CENIC
Larry Smarr, UCSD
75. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data Slide Source: Larry Smarr, UCSD
76. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Next Step: Surrounding the CHASE-CI Machine Learning Platform
With Clouds of GPUs and Non Von Neumann Processors
Microsoft Installs FPGAs into Bing Servers &
432 into TAAC for Academic Access
64-TrueNorth
Cluster
CHASE-CI
Slide Source: Larry Smarr, UCSD
82. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
Parts of the Solution
• Stakeholders
• Datasets
• Compliance requirements
• Defined actions
• Analytical methods
• Technical infrastructure
Bias
Transparency
Verification
Accuracy
Ethics
Reproducibility
Cost
83. CENIC 2018 Keynote – Ilkay Altintas, PhD (ialtintas@ucsd.edu )
To summarize…
• Data science is a collaborative activity
• Focus on collaboration and communication from problem definition stage
• Apply process management techniques where necessary
• Incorporate and formalize definition of success from different perspectives
• Measurable automation should be the end goal
• Requires built in programmable and scalable data pipelines
• Includes measurable and programmable networks
• Iterations based on pre-defined metrics help
• PPODS is a methodology for collaborative data science application
integration and iteration
• Toolkits for process automation, scalable execution, provenance tracking and
reporting