A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
2. The digital age has changed the way we
Live, Play, Learn and WorkâŚ
3. Companies must
shift to a
Data-Driven
Business
are vulnerable
to disruption
within
three years
72%
Transformation is CriticalâŚ
4. Estimated worldwide
startups each day
274,000
Why weâre all vulnerable
to seismic shifts
External Threats
Born-on-digital companies that steal market
share or rewrite customer expectations
New business models that reinvent our industry
and change the game altogether
Internal Threats
Siloed data and systems
Gaps in expertise and skills
Inability to react quickly
4Group Name / DOC ID / Month XX, 2017 SOURCE cited in notes
5. Value
Uses of Data
Efficiency Modernization Data Decision Monetization
Unleashing your data and making the shift to a
Data-Driven Organization
Operations Reporting &
Data
Warehousing
Self-Service
Analytics
New
Business
Models
Data Science
6. Data science is a "concept to unify statistics, data analysis
and their related methods" in order to "understand and
analyze an actual phenomena" with data.
What is Data Science?
7. Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
What makes a Data Scientist?
Unicorn
Data Science Projects Require multiple Skills
8. Math &
Stats
Computer
Science
Domain
Expertise
Data Science Projects Require multiple Skills
What makes a Data Scientist?
Unicorn
Machine
Learning
ResearchEngineering
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
9. Data Science is a Team Sport.
Business
Analysts
Data
Scientists
Application
Developers
Data
Engineers
10. Clearly Articulate
Use Case
Gather all the Data
Apply
Machine
Learning
Prepare Data
Digital
Application
Evaluate
Steps to put Data Science to work..
11. Data Predictions
& Insight
âComputers that learn without being explicitly programmedâ
âUsing algorithms to understand patterns in dataâ
Algorithms
Machine Learning⌠What is it?
12. Machine Learning - Process
Data
Ingestion
Data Cleaning
and
Transformation
Model
Training
Testing and
Validation
Deployment
Model Selection
13. History of Democratizing Data Science
1960s
Digital
Calculator Spreadsheet SQL Machine Learning
1960s
IBM
1980s
Desktop
1700s
Mechanical
Innovation
1970s
IBM
1990s
OO
1980s
IBM
2010s
Open Source
2017
2020s
AI
14. Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
Future of Data Science is in Democratizing Machine Learning and AI in the Cloud
Future - Democratizing Machine Learning & AIâŚ
Unicorn
Machine
Learning
ResearchEngineering
15. Example of Machine Learning
Building Model to Predict Energy Consumption of Buildings.
16. Example of Machine Learning in Action
Chat Bot to estimate energy cost from an image of building.
Great, thanks for that
picture! Looks like
your building is
made of stone and
has large windows
I estimate your
building has a high
energy usage
intensity (EUI), with a
97.01% probability
17. Data Science technology trends..
SPSS SAS
Python R Scala
Trends in Google Searches (September 2nd 2016)
18. Data Science is Driving the Database to Big Data Evolution..
Databases
Big Data
Source: Google Trends
Hadoop
Spark
19. 19
Open R ->
Big Data ->
Python ->
The Convergence of Big Data & Data Science
20. Launch Spark Technology
Cluster
www.Spark.tc
Contribute to
Community
Infuse
Portfolio
Integrate Apache Spark
throughout IBMâs
portfolio
Used by Watson
Foster
Community
Educate and grow data
scientist community
www.BigDataUniversity.com
"It's like Spark just got blessed
by the enterprise rabbi."
Ben Horowitz
IBM is all-in on Spark
21. IBM Contributions â Driving Data Science at ScaleâŚ
38,500 Spark LOC
863 Spark JIRAs
253 SystemML JIRAs
422 Commits in Spark 2.0
0
200
400
600
800
19 23 28 33 37 41 45 49 1 5 9 13 17 22 25 29 33 37
Contribution Progress
2015
2016
2
1
Top 3
Driving Data Science
⢠ML
⢠PySpark
⢠SQL
22. § Spark Machine Learning (ML) provides a toolset to create pipelines
of different ML related transformations on your data
§ IBM is #1 contributor in the Spark (ML)
IBM impact on SparkML / MLlib 2.0
0
20
40
60
80
100
120
140
Top 10 Contributing Companies to Spark ML/MLlib 2.0.0
34%
Hortonworks
16%Databricks
13%
Intel
9%
Contributions to Spark ML 2.0.0
23. IBM Data Science
Experience is an
environment that brings
together everything that a
Data Scientist needs to be
more productive, including
tools, data and content
Be a better data scientist
Introducing IBM Data Science Experience
24. Built-in learning to
get started or go
the distance with
advanced
tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
IBM Data Science Experience
⢠Find tutorials and datasets
⢠Connect with Data Scientists
⢠Ask questions
⢠Read articles and papers
⢠Fork and share projects
⢠Watson Machine Learning
⢠SPSS Modeler Canvas
⢠Advanced Visualizations
⢠Projects and Version Control
⢠Managed Spark Service
⢠Code in Scala/Python/R
⢠Jupyter Notebooks
⢠RStudio IDE and Shiny
⢠Apache Spark
⢠Your favorite libraries
25. Open source is a powerful engine, but as with any engine, it needs
the full system to accomplish any work
⢠Hosting â Tools are ready to
go, no install necessary
§ Security â SSO and code
hardening to reduce security
gaps
§ Version Currency â We
keep up-to-date as open
source quickly iterates
§ Data Connectivity â
Connect to data sources
§ Scalability â Makes tools
designed for desktops
scalable to enterprise
workloads
We provide:
26. Notebooks are browser-based interactive and collaborative development
environments for data science
Notebooks are
interactive
computational
environments, in
which you can
combine code
execution, rich text,
mathematics, plots
and rich media.
27. Projects are shared, collaborative workspaces that gather all assets &
content in a single area
Internal and external
collaborators can be added,
with relevant roles /
permissions set by project
owner
Any type of analytical asset can
be part of a project, clicking on
asset opens it in the right tool
and in project context
Each project provides its own
separate storage space,
available to collaborators only
People
Artifacts
ln:
ln:
ln:
ln:
ln:
ln:
Data
Project
28. Divide by function: Similar to a surgical team, notebooks enable
work to be partitioned functionally, by skill level
Surgeon:
Executes all other
pre and post work
Attending
Surgeon:
Executes most
delicate procedures
requiring greatest
skill
Resident:
Preps the patient
and assists
Data Scientist:
Exploratory
analysis, feature
selection,
deployment
Sr. Data Scientist:
Builds advanced
models, reviews
earlier work
Business Analyst:
Articulates
problem, finds and
prepares data
Š 2016 IBM Corporation28
29. Watson Machine Learning capabilities overview
Predictive
Power
100%
Capacity
Model Builder
(CADS)
Build model1
Deploy model2
Refresh model3
Import Sources:
§ DSx Notebooks
§ DSx Flow UI
§ External tools
Auto-generate model
from input data,
testing various
algorithms for best
fit (e.g. CADS)
Detect loss of
predictive power and
refresh model,
subject to
preferences
Deploy model
into production -
scale, manage
and monitor
Model Automation Model Deployment
Model
30. The full range of Watson Cognitive services will be accessible
within DSx
Alchemy
Language
Conversa-
tion Dialog
Document
Conversion
Language
Translator
Natural
Language
Classifier
Natural
Language
Under-
standing
Personality
Insights
Retrieve
and Rank
Tone
Analyzer
Speech to
Text
Visual
Recognition
Text to
Speech
Alchemy-
Data News
Discovery
Discovery
News
Tradeoff
Analytics
Speech
Vision
Data Insights
Language
31. Weâve been recognized for our vision
Source: https://www.gartner.com/doc/reprints?id=1-3TKD8OH&ct=170215&st=sb
http://www.developerweek.com/awards/2017-devies-award-winners/
Gartner Magic Quadrant 2017
Data Science Platforms
DeveloperWeek 2017
Devie
Forrester Wave 2017
Predictive Analytics & Machine Learning
32. IBM Data Science Experience
https://www.youtube.com/watch?v=HPzXlFp4rKE
Demo
33. Get Started with Data Science Experience Today!
§ DSx is available for personal use for free, with enough
power to learn data science and try most examples
§ Follow the example outlined in a blog post, with link to the
full GitHub repo and step by step instructions (see
README in directory)
§ http://datascience.ibm.com/blog/modeling-energy-
usage-in-new-york-city/
§ Additional tutorials and reference materials within the
community section of DSx
§ Find your own use case and try it, or find other relevant
examples within DSx
Sign up
Learn
Try it!
Cloud: datascience.ibm.com
Desktop: datascience.ibm.com/desktop
Local: datascience.ibm.com/local
34. Š IBM Corporation 2017
IBM, the IBM logo, ibm.com, and Watson are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on
their first occurrence in this information with the appropriate symbol (ÂŽ or â˘), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at
âCopyright and trademark information.â
⢠Other company, product, and service names may be trademarks or service marks of others.
⢠References in this publication to IBM products or services do not imply that IBM intends to make them available in all
countries in which IBM operates.
Trademarks and notes
34Group Name / DOC ID / Month XX, 2017