In my talk, “The Practice of Data Science,” I provided a high-level overview of what it means to practice data science by taking a look at the people, processes and tools that underlie the field of data science. You can view this talk (and many others) by registering for the free online conference from Metis here: https://www.thisismetis.com/demystifying-data-science
The Practice of Data Science - Demystifying Data Science Conference
1. The Practice of Data Science:
People, Processes and Tools
Bob. E. Hayes, PhD
bob@businessoverbroadway.com
@bobehayes
Presented at Metis’ Demystifying Data Science: A FREE
Online Conference for Aspiring Data Scientists – Sept 27,
2017
2. Bob E. Hayes, PhD
Email: bob@businessoverbroadway.com
Web: www.businessoverbroadway.com
Twitter: @bobehayes
• Author of three books on customer experience
management and analytics
• PhD in industrial-organizational psychology
• #6 blogger overall on CustomerThink
(http://customerthink.com/author/bobehayes/)
• #3 blogger on the topic of customer analytics
(http://customerthink.com/top-authors-category/)
• Top expert in Big Data and Data Science
• https://www.maptive.com/the-top-100-big-data-
experts/
• http://www.kdnuggets.com/2015/02/top-big-data-
influencers-brands.html
3. 3
Outline
• Why now?
• Definition of Data Science
• The People: Data Science Skills
• The Process: From Data to Insight
• The Tools
• Education Requirements
• Gender Diversity
5. Analytics Skills Gap is Huge*
* From PwC: Investing in America’s Data Science and Analytics Talent
6. 6
Data Science Defined
Data science is way of extracting
insights from data using the powers of
computer science and statistics applied to
data from a specific field of study.
8. 8
JobRolesinDataScience
*Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative
(e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
9. 9
Three Skill Domains of Data Science
Domain
Knowledge
Math /
Statistics
Technology /
Programming
10. 10
25 Data Science Skills
Top 10 Data Science
Skills
1. Communication
2. Managing structured data
3. Data mining and visualization tools
4. Science / Scientific method
5. Math
6. Project management
7. Data management
8. Statistics and statistical modeling
9. Product design and development
10. Business developmentData are based on responses to AnalyticsWeek and Business Over
Broadway Data Science Survey. From September 2015.
11. 11
Skill Proficiency Varies by Data Science Role
0
10
20
30
40
50
60
70
80
Buisness development
Budgeting
Goverance and Compliance
Optimization
Math
Graphical Models
Algorithms
Bayesian Statistics
Machine Learning
Data Mining and Viz Tools
Statistics and statistical modeling
Science/Scientific Method
CommunicationUnstructured data
Structured data
NLP and text mining
Data Management
Big and distributed data
Systems Administration
Database Administration
Cloud Management
Back-end Programming
Front-end Programming
Product Design
Project management
Domain Expert
Developer
Researcher
Proficiency Standard
Math /
Statistics
Tech /
Programming
Domain Knowledge
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
12. 12
In Search of the Data Science Unicorn
I wish I knew
some Python.
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
13. 14
Analytics, Data Mining and Data Science Methods
S = Start with Strategy
M = Measure Metrics and Data
A = Apply Analytics
R = Report Results
T = Transform your Business
From “CRISP-DM, still the top methodology for analytics, data mining, or data science projects“
http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
14. 15
Cross Industry Standard Process
for Data Mining (CRISP-DM)
(IBM, Teradata, Daimler AG, NCR Corporation and OHRA)
From Data to Insight
For more information on these methods, see: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining;
https://en.wikipedia.org/wiki/SEMMA; https://en.wikipedia.org/wiki/Data_mining
Knowledge Discovery in
Databases (KDD)
SEMMA
(SAS)
15. 16
Getting Insight from Data: The Scientific Method
1. Formulate
Questions
2. Generate
hypothesis/
hunch
3. Gather /
Generate data
4. Analyze
data / Test
hypothesis
5. Take action /
Communicate
results
• Start with a problem
statement.
• What are your hunches /
hypotheses?
• Be sure your hypotheses
are testable.
• You can use experimental or
observational approach to
analyzing data.
• Integrate your data silos to ask
bigger questions; connect the
dots and get a 360 degree view of
the phenomenon you’re studying.
• Employ Predictive analytics /
Inferential statistics to test
hypotheses.
• Employ machine learning to
quickly surface insights.
• Implement your findings;
inform decision-makers;
optimize algorithms
• Use Prescriptive analytics
to guide course of action.
16. 17
Iterative Process of Discovery
Image from Netflix Tech Blog: https://medium.com/netflix-techblog/a-b-testing-and-beyond-improving-the-netflix-streaming-experience-
with-experimentation-and-data-5b0ae9295bdf
19. 20
Top Data Science Tools
Rexer Analytics Data
Science Survey 2015
For a comprehensive overview of different data science tools,
please see: http://r4stats.com/articles/popularity/
20. 21
Data Science Ecosystem
Gartner Magic Quadrant (2017) Forrester Wave
Leaders
IBM
SAS
RapidMiner
KNIME
For a good review of data science platforms, please see:
https://thomaswdinsmore.com/2017/02/28/gartner-looks-at-data-science-platforms/
30. 31
Advice for Data Scientists
• Be specific when talking about “data scientists”
• There are different types – defined by what they do and the skills they possess
• Work with other data professionals who have complementary skills.
Teamwork is key to successful data science projects.
• Learn to use data mining and visualization tools
• R, Python, SPSS, SAS, graphics, mapping, web-based data visualization
• Be an advocate for women in the field of data science
Hinweis der Redaktion
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
SEMMA is an acronym that stands for Sample, Explore, Modify, Model, and Assess. It is a list of sequential steps developed by SAS Institute. The only other data mining approach named in these polls was SEMMA. However, SAS Institute clearly states that SEMMA is not a data mining methodology, but rather a "logical organization of the functional tool set of SAS Enterprise Miner."
The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.