SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Future of
Data Science
Rethinking business, technology and data.
Carlo Appugliese
Data Science Evangelist
The digital age has changed the way we
Live, Play, Learn and Work…
Companies must
shift to a
Data-Driven
Business
are vulnerable
to disruption
within
three years
72%
Transformation is Critical…
Estimated worldwide
startups each day
274,000
Why we’re all vulnerable
to seismic shifts
External Threats
Born-on-digital companies that steal market
share or rewrite customer expectations
New business models that reinvent our industry
and change the game altogether
Internal Threats
Siloed data and systems
Gaps in expertise and skills
Inability to react quickly
4Group Name / DOC ID / Month XX, 2017 SOURCE cited in notes
Value
Uses of Data
Efficiency Modernization Data Decision Monetization
Unleashing your data and making the shift to a
Data-Driven Organization
Operations Reporting &
Data
Warehousing
Self-Service
Analytics
New
Business
Models
Data Science
Data science is a "concept to unify statistics, data analysis
and their related methods" in order to "understand and
analyze an actual phenomena" with data.
What is Data Science?
Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
What makes a Data Scientist?
Unicorn
Data Science Projects Require multiple Skills
Math &
Stats
Computer
Science
Domain
Expertise
Data Science Projects Require multiple Skills
What makes a Data Scientist?
Unicorn
Machine
Learning
ResearchEngineering
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
Data Science is a Team Sport.
Business
Analysts
Data
Scientists
Application
Developers
Data
Engineers
Clearly Articulate
Use Case
Gather all the Data
Apply
Machine
Learning
Prepare Data
Digital
Application
Evaluate
Steps to put Data Science to work..
Data Predictions
& Insight
“Computers that learn without being explicitly programmed”
“Using algorithms to understand patterns in data”
Algorithms
Machine Learning… What is it?
Machine Learning - Process
Data
Ingestion
Data Cleaning
and
Transformation
Model
Training
Testing and
Validation
Deployment
Model Selection
History of Democratizing Data Science
1960s
Digital
Calculator Spreadsheet SQL Machine Learning
1960s
IBM
1980s
Desktop
1700s
Mechanical
Innovation
1970s
IBM
1990s
OO
1980s
IBM
2010s
Open Source
2017
2020s
AI
Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
Future of Data Science is in Democratizing Machine Learning and AI in the Cloud
Future - Democratizing Machine Learning & AI…
Unicorn
Machine
Learning
ResearchEngineering
Example of Machine Learning
Building Model to Predict Energy Consumption of Buildings.
Example of Machine Learning in Action
Chat Bot to estimate energy cost from an image of building.
Great, thanks for that
picture! Looks like
your building is
made of stone and
has large windows
I estimate your
building has a high
energy usage
intensity (EUI), with a
97.01% probability
Data Science technology trends..
SPSS SAS
Python R Scala
Trends in Google Searches (September 2nd 2016)
Data Science is Driving the Database to Big Data Evolution..
Databases
Big Data
Source: Google Trends
Hadoop
Spark
19
Open R ->
Big Data ->
Python ->
The Convergence of Big Data & Data Science
Launch Spark Technology
Cluster
www.Spark.tc
Contribute to
Community
Infuse
Portfolio
Integrate Apache Spark
throughout IBM’s
portfolio
Used by Watson
Foster
Community
Educate and grow data
scientist community
www.BigDataUniversity.com
"It's like Spark just got blessed
by the enterprise rabbi."
Ben	Horowitz
IBM is all-in on Spark
IBM Contributions – Driving Data Science at Scale…
38,500 Spark LOC
863 Spark JIRAs
253 SystemML JIRAs
422 Commits in Spark 2.0
0
200
400
600
800
19 23 28 33 37 41 45 49 1 5 9 13 17 22 25 29 33 37
Contribution Progress
2015
2016
2
1
Top 3
Driving Data Science
• ML
• PySpark
• SQL
§ Spark Machine Learning (ML) provides a toolset to create pipelines
of different ML related transformations on your data
§ IBM is #1 contributor in the Spark (ML)
IBM impact on SparkML / MLlib 2.0
0
20
40
60
80
100
120
140
Top 10 Contributing Companies to Spark ML/MLlib 2.0.0
34%
Hortonworks
16%Databricks
13%
Intel
9%
Contributions to Spark ML 2.0.0
IBM Data Science
Experience is an
environment that brings
together everything that a
Data Scientist needs to be
more productive, including
tools, data and content
Be a better data scientist
Introducing IBM Data Science Experience
Built-in learning to
get started or go
the distance with
advanced
tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
IBM Data Science Experience
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Watson Machine Learning
• SPSS Modeler Canvas
• Advanced Visualizations
• Projects and Version Control
• Managed Spark Service
• Code in Scala/Python/R
• Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
Open source is a powerful engine, but as with any engine, it needs
the full system to accomplish any work
• Hosting – Tools are ready to
go, no install necessary
§ Security – SSO and code
hardening to reduce security
gaps
§ Version Currency – We
keep up-to-date as open
source quickly iterates
§ Data Connectivity –
Connect to data sources
§ Scalability – Makes tools
designed for desktops
scalable to enterprise
workloads
We provide:
Notebooks are browser-based interactive and collaborative development
environments for data science
Notebooks are
interactive
computational
environments, in
which you can
combine code
execution, rich text,
mathematics, plots
and rich media.
Projects are shared, collaborative workspaces that gather all assets &
content in a single area
Internal and external
collaborators can be added,
with relevant roles /
permissions set by project
owner
Any type of analytical asset can
be part of a project, clicking on
asset opens it in the right tool
and in project context
Each project provides its own
separate storage space,
available to collaborators only
People
Artifacts
ln:
ln:
ln:
ln:
ln:
ln:
Data
Project
Divide by function: Similar to a surgical team, notebooks enable
work to be partitioned functionally, by skill level
Surgeon:
Executes all other
pre and post work
Attending
Surgeon:
Executes most
delicate procedures
requiring greatest
skill
Resident:
Preps the patient
and assists
Data Scientist:
Exploratory
analysis, feature
selection,
deployment
Sr. Data Scientist:
Builds advanced
models, reviews
earlier work
Business Analyst:
Articulates
problem, finds and
prepares data
Š 2016 IBM Corporation28
Watson Machine Learning capabilities overview
Predictive
Power
100%
Capacity
Model Builder
(CADS)
Build model1
Deploy model2
Refresh model3
Import Sources:
§ DSx Notebooks
§ DSx Flow UI
§ External tools
Auto-generate model
from input data,
testing various
algorithms for best
fit (e.g. CADS)
Detect loss of
predictive power and
refresh model,
subject to
preferences
Deploy model
into production -
scale, manage
and monitor
Model Automation Model Deployment
Model
The full range of Watson Cognitive services will be accessible
within DSx
Alchemy
Language
Conversa-
tion Dialog
Document
Conversion
Language
Translator
Natural
Language
Classifier
Natural
Language
Under-
standing
Personality
Insights
Retrieve
and Rank
Tone
Analyzer
Speech to
Text
Visual
Recognition
Text to
Speech
Alchemy-
Data News
Discovery
Discovery
News
Tradeoff
Analytics
Speech
Vision
Data Insights
Language
We’ve been recognized for our vision
Source: https://www.gartner.com/doc/reprints?id=1-3TKD8OH&ct=170215&st=sb
http://www.developerweek.com/awards/2017-devies-award-winners/
Gartner Magic Quadrant 2017
Data Science Platforms
DeveloperWeek 2017
Devie
Forrester Wave 2017
Predictive Analytics & Machine Learning
IBM Data Science Experience
https://www.youtube.com/watch?v=HPzXlFp4rKE
Demo
Get Started with Data Science Experience Today!
§ DSx is available for personal use for free, with enough
power to learn data science and try most examples
§ Follow the example outlined in a blog post, with link to the
full GitHub repo and step by step instructions (see
README in directory)
§ http://datascience.ibm.com/blog/modeling-energy-
usage-in-new-york-city/
§ Additional tutorials and reference materials within the
community section of DSx
§ Find your own use case and try it, or find other relevant
examples within DSx
Sign up
Learn
Try it!
Cloud: datascience.ibm.com
Desktop: datascience.ibm.com/desktop
Local: datascience.ibm.com/local
Š IBM Corporation 2017
IBM, the IBM logo, ibm.com, and Watson are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on
their first occurrence in this information with the appropriate symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information.”
• Other company, product, and service names may be trademarks or service marks of others.
• References in this publication to IBM products or services do not imply that IBM intends to make them available in all
countries in which IBM operates.
Trademarks and notes
34Group Name / DOC ID / Month XX, 2017
The Future of Data Science

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science Ansh Budania
 
Data science
Data scienceData science
Data scienceMohamed Loey
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
What is big data?
What is big data?What is big data?
What is big data?David Wellman
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Big data analytics
Big data analyticsBig data analytics
Big data analyticsVikram Nandini
 
Data Science
Data ScienceData Science
Data ScienceAmit Singh
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 

Was ist angesagt? (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
Data science
Data scienceData science
Data science
 
Big data
Big dataBig data
Big data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
What is big data?
What is big data?What is big data?
What is big data?
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data science
Data scienceData science
Data science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Data Science
Data ScienceData Science
Data Science
 
Big data
Big dataBig data
Big data
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 

Ähnlich wie The Future of Data Science

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumStarttech Ventures
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiKoray Kocabas
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAmazon Web Services
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Introduction to the source{d} Stack
Introduction to the source{d} Stack Introduction to the source{d} Stack
Introduction to the source{d} Stack source{d}
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big dataPratiksha Manan
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big dataPratiksha Manan
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas
 

Ähnlich wie The Future of Data Science (20)

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesi
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Introduction to the source{d} Stack
Introduction to the source{d} Stack Introduction to the source{d} Stack
Introduction to the source{d} Stack
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate Presentation
 

Mehr von DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash CourseDataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

KĂźrzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

KĂźrzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

The Future of Data Science

  • 1. Future of Data Science Rethinking business, technology and data. Carlo Appugliese Data Science Evangelist
  • 2. The digital age has changed the way we Live, Play, Learn and Work…
  • 3. Companies must shift to a Data-Driven Business are vulnerable to disruption within three years 72% Transformation is Critical…
  • 4. Estimated worldwide startups each day 274,000 Why we’re all vulnerable to seismic shifts External Threats Born-on-digital companies that steal market share or rewrite customer expectations New business models that reinvent our industry and change the game altogether Internal Threats Siloed data and systems Gaps in expertise and skills Inability to react quickly 4Group Name / DOC ID / Month XX, 2017 SOURCE cited in notes
  • 5. Value Uses of Data Efficiency Modernization Data Decision Monetization Unleashing your data and making the shift to a Data-Driven Organization Operations Reporting & Data Warehousing Self-Service Analytics New Business Models Data Science
  • 6. Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze an actual phenomena" with data. What is Data Science?
  • 7. Math & Stats Computer Science Domain Expertise Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking What makes a Data Scientist? Unicorn Data Science Projects Require multiple Skills
  • 8. Math & Stats Computer Science Domain Expertise Data Science Projects Require multiple Skills What makes a Data Scientist? Unicorn Machine Learning ResearchEngineering Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking
  • 9. Data Science is a Team Sport. Business Analysts Data Scientists Application Developers Data Engineers
  • 10. Clearly Articulate Use Case Gather all the Data Apply Machine Learning Prepare Data Digital Application Evaluate Steps to put Data Science to work..
  • 11. Data Predictions & Insight “Computers that learn without being explicitly programmed” “Using algorithms to understand patterns in data” Algorithms Machine Learning… What is it?
  • 12. Machine Learning - Process Data Ingestion Data Cleaning and Transformation Model Training Testing and Validation Deployment Model Selection
  • 13. History of Democratizing Data Science 1960s Digital Calculator Spreadsheet SQL Machine Learning 1960s IBM 1980s Desktop 1700s Mechanical Innovation 1970s IBM 1990s OO 1980s IBM 2010s Open Source 2017 2020s AI
  • 14. Math & Stats Computer Science Domain Expertise Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking Future of Data Science is in Democratizing Machine Learning and AI in the Cloud Future - Democratizing Machine Learning & AI… Unicorn Machine Learning ResearchEngineering
  • 15. Example of Machine Learning Building Model to Predict Energy Consumption of Buildings.
  • 16. Example of Machine Learning in Action Chat Bot to estimate energy cost from an image of building. Great, thanks for that picture! Looks like your building is made of stone and has large windows I estimate your building has a high energy usage intensity (EUI), with a 97.01% probability
  • 17. Data Science technology trends.. SPSS SAS Python R Scala Trends in Google Searches (September 2nd 2016)
  • 18. Data Science is Driving the Database to Big Data Evolution.. Databases Big Data Source: Google Trends Hadoop Spark
  • 19. 19 Open R -> Big Data -> Python -> The Convergence of Big Data & Data Science
  • 20. Launch Spark Technology Cluster www.Spark.tc Contribute to Community Infuse Portfolio Integrate Apache Spark throughout IBM’s portfolio Used by Watson Foster Community Educate and grow data scientist community www.BigDataUniversity.com "It's like Spark just got blessed by the enterprise rabbi." Ben Horowitz IBM is all-in on Spark
  • 21. IBM Contributions – Driving Data Science at Scale… 38,500 Spark LOC 863 Spark JIRAs 253 SystemML JIRAs 422 Commits in Spark 2.0 0 200 400 600 800 19 23 28 33 37 41 45 49 1 5 9 13 17 22 25 29 33 37 Contribution Progress 2015 2016 2 1 Top 3 Driving Data Science • ML • PySpark • SQL
  • 22. § Spark Machine Learning (ML) provides a toolset to create pipelines of different ML related transformations on your data § IBM is #1 contributor in the Spark (ML) IBM impact on SparkML / MLlib 2.0 0 20 40 60 80 100 120 140 Top 10 Contributing Companies to Spark ML/MLlib 2.0.0 34% Hortonworks 16%Databricks 13% Intel 9% Contributions to Spark ML 2.0.0
  • 23. IBM Data Science Experience is an environment that brings together everything that a Data Scientist needs to be more productive, including tools, data and content Be a better data scientist Introducing IBM Data Science Experience
  • 24. Built-in learning to get started or go the distance with advanced tutorials Learn The best of open source and IBM value-add to create state-of-the-art data products Create Community and social features that provide meaningful collaboration Collaborate http://datascience.ibm.com IBM Data Science Experience • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Watson Machine Learning • SPSS Modeler Canvas • Advanced Visualizations • Projects and Version Control • Managed Spark Service • Code in Scala/Python/R • Jupyter Notebooks • RStudio IDE and Shiny • Apache Spark • Your favorite libraries
  • 25. Open source is a powerful engine, but as with any engine, it needs the full system to accomplish any work • Hosting – Tools are ready to go, no install necessary § Security – SSO and code hardening to reduce security gaps § Version Currency – We keep up-to-date as open source quickly iterates § Data Connectivity – Connect to data sources § Scalability – Makes tools designed for desktops scalable to enterprise workloads We provide:
  • 26. Notebooks are browser-based interactive and collaborative development environments for data science Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
  • 27. Projects are shared, collaborative workspaces that gather all assets & content in a single area Internal and external collaborators can be added, with relevant roles / permissions set by project owner Any type of analytical asset can be part of a project, clicking on asset opens it in the right tool and in project context Each project provides its own separate storage space, available to collaborators only People Artifacts ln: ln: ln: ln: ln: ln: Data Project
  • 28. Divide by function: Similar to a surgical team, notebooks enable work to be partitioned functionally, by skill level Surgeon: Executes all other pre and post work Attending Surgeon: Executes most delicate procedures requiring greatest skill Resident: Preps the patient and assists Data Scientist: Exploratory analysis, feature selection, deployment Sr. Data Scientist: Builds advanced models, reviews earlier work Business Analyst: Articulates problem, finds and prepares data Š 2016 IBM Corporation28
  • 29. Watson Machine Learning capabilities overview Predictive Power 100% Capacity Model Builder (CADS) Build model1 Deploy model2 Refresh model3 Import Sources: § DSx Notebooks § DSx Flow UI § External tools Auto-generate model from input data, testing various algorithms for best fit (e.g. CADS) Detect loss of predictive power and refresh model, subject to preferences Deploy model into production - scale, manage and monitor Model Automation Model Deployment Model
  • 30. The full range of Watson Cognitive services will be accessible within DSx Alchemy Language Conversa- tion Dialog Document Conversion Language Translator Natural Language Classifier Natural Language Under- standing Personality Insights Retrieve and Rank Tone Analyzer Speech to Text Visual Recognition Text to Speech Alchemy- Data News Discovery Discovery News Tradeoff Analytics Speech Vision Data Insights Language
  • 31. We’ve been recognized for our vision Source: https://www.gartner.com/doc/reprints?id=1-3TKD8OH&ct=170215&st=sb http://www.developerweek.com/awards/2017-devies-award-winners/ Gartner Magic Quadrant 2017 Data Science Platforms DeveloperWeek 2017 Devie Forrester Wave 2017 Predictive Analytics & Machine Learning
  • 32. IBM Data Science Experience https://www.youtube.com/watch?v=HPzXlFp4rKE Demo
  • 33. Get Started with Data Science Experience Today! § DSx is available for personal use for free, with enough power to learn data science and try most examples § Follow the example outlined in a blog post, with link to the full GitHub repo and step by step instructions (see README in directory) § http://datascience.ibm.com/blog/modeling-energy- usage-in-new-york-city/ § Additional tutorials and reference materials within the community section of DSx § Find your own use case and try it, or find other relevant examples within DSx Sign up Learn Try it! Cloud: datascience.ibm.com Desktop: datascience.ibm.com/desktop Local: datascience.ibm.com/local
  • 34. Š IBM Corporation 2017 IBM, the IBM logo, ibm.com, and Watson are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (ÂŽ or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information.” • Other company, product, and service names may be trademarks or service marks of others. • References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. Trademarks and notes 34Group Name / DOC ID / Month XX, 2017