1. Data Science not just for
Big Data
Gregory Piatetsky, @kdnuggets
Analytics, Big Data,
Data Mining, and Data Science Resources
Š KDnuggets 2013
1
2. What do we call it?
Same Core Idea:
⢠Statistics, 1830⢠Data mining, 1980Finding Useful
⢠Knowledge Discovery in
Patterns in Data
Data (KDD), 1989â˘
â˘
â˘
â˘
â˘
Business Analytics, 1997Predictive Analytics, 2002Data Analytics,2011Data Science, 2011Big Data, 2012 Š KDnuggets 2013
Different
Emphasis
2
3. Big Data > Data Mining >
> Predictive Analytics , Data Science
Data mining
Data Mining
Big Data
Big Data
Google Trends search, Jan 2008- Sep 2013, Worldwide
Š KDnuggets 2013
3
4. Data Science before âBig Dataâ
⢠Ancient astronomers
⢠Kepler laws of planetary motion
(1609), derived from observations by
Tycho Brahe
⢠Genetics â Gregor Mendel found
patterns in inheritance of pea plants
⢠Western Medicine
⢠âŚ
Š KDnuggets 2013
4
5. Ignaz Semmelweis â early data
scientist (1818-1865)
Graph
from
Wikipedia
Semmelweis found that the main difference between clinics was that 1st had
medical students who also examined cadavers, and inferred that students
carried something on their hands from the autopsy. He proposed washing
Š KDnuggets 2013
hands after autopsy but was rejected and died in insane asylum
5
7. Data Science Application:
Process, not one step
CRISP-DM
process
Building
Predictive
Models
Š KDnuggets 2013
Most fun for data
scientists,
But only a small
part of the
process
7
8. Data Science Basic Principles & Ideas
⢠Focus on actionable patterns
⢠Build predictive models - supervised learning
(train, test, x-validate)
⢠Avoid overfitting
⢠Calculating similarity of objects - unsupervised learning
⢠Avoid information leakers
⢠Select important variables/features
⢠Model accuracy vs lift: how much more prevalent a
pattern is than would be expected by chance
⢠Estimate probability and cost/gain of actions
⢠Help optimize decisions
Š KDnuggets 2013
8
9. What Changes in Data Science
with Big Data?
⢠Data munging becomes much more complex
⢠New algorithms, technology needed to deal with
Big Data Volume, Velocity, & Variety
⢠New, effective algorithms that require Big Data:
e.g.: deep belief networks, recommendations
⢠Predictions become (somewhat ) more accurate
⢠New things become visible: social
networks, recommendations, mobility, knowledg
e?
⢠However, basic principles remain
Š KDnuggets 2013
9