SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Data Science not just for
Big Data
Gregory Piatetsky, @kdnuggets
Analytics, Big Data,
Data Mining, and Data Science Resources

Š KDnuggets 2013

1
What do we call it?
Same Core Idea:
• Statistics, 1830• Data mining, 1980Finding Useful
• Knowledge Discovery in
Patterns in Data
Data (KDD), 1989•
•
•
•
•

Business Analytics, 1997Predictive Analytics, 2002Data Analytics,2011Data Science, 2011Big Data, 2012 Š KDnuggets 2013

Different
Emphasis
2
Big Data > Data Mining >
> Predictive Analytics , Data Science
Data mining
Data Mining

Big Data

Big Data

Google Trends search, Jan 2008- Sep 2013, Worldwide
Š KDnuggets 2013

3
Data Science before “Big Data”
• Ancient astronomers
• Kepler laws of planetary motion
(1609), derived from observations by
Tycho Brahe
• Genetics – Gregor Mendel found
patterns in inheritance of pea plants
• Western Medicine
• …
Š KDnuggets 2013

4
Ignaz Semmelweis – early data
scientist (1818-1865)
Graph
from
Wikipedia

Semmelweis found that the main difference between clinics was that 1st had
medical students who also examined cadavers, and inferred that students
carried something on their hands from the autopsy. He proposed washing
Š KDnuggets 2013
hands after autopsy but was rejected and died in insane asylum

5
Data Science Application:
Process, not one step
CRISP-DM
process

Š KDnuggets 2013

6
Data Science Application:
Process, not one step
CRISP-DM
process

Building
Predictive
Models

Š KDnuggets 2013

Most fun for data
scientists,
But only a small
part of the
process
7
Data Science Basic Principles & Ideas
• Focus on actionable patterns
• Build predictive models - supervised learning
(train, test, x-validate)
• Avoid overfitting
• Calculating similarity of objects - unsupervised learning
• Avoid information leakers
• Select important variables/features
• Model accuracy vs lift: how much more prevalent a
pattern is than would be expected by chance
• Estimate probability and cost/gain of actions
• Help optimize decisions
Š KDnuggets 2013

8
What Changes in Data Science
with Big Data?
• Data munging becomes much more complex
• New algorithms, technology needed to deal with
Big Data Volume, Velocity, & Variety
• New, effective algorithms that require Big Data:
e.g.: deep belief networks, recommendations
• Predictions become (somewhat ) more accurate
• New things become visible: social
networks, recommendations, mobility, knowledg
e?
• However, basic principles remain
Š KDnuggets 2013

9

Weitere ähnliche Inhalte

KĂźrzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂźrzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

data-science-not-just-for-big-data

  • 1. Data Science not just for Big Data Gregory Piatetsky, @kdnuggets Analytics, Big Data, Data Mining, and Data Science Resources Š KDnuggets 2013 1
  • 2. What do we call it? Same Core Idea: • Statistics, 1830• Data mining, 1980Finding Useful • Knowledge Discovery in Patterns in Data Data (KDD), 1989• • • • • Business Analytics, 1997Predictive Analytics, 2002Data Analytics,2011Data Science, 2011Big Data, 2012 Š KDnuggets 2013 Different Emphasis 2
  • 3. Big Data > Data Mining > > Predictive Analytics , Data Science Data mining Data Mining Big Data Big Data Google Trends search, Jan 2008- Sep 2013, Worldwide Š KDnuggets 2013 3
  • 4. Data Science before “Big Data” • Ancient astronomers • Kepler laws of planetary motion (1609), derived from observations by Tycho Brahe • Genetics – Gregor Mendel found patterns in inheritance of pea plants • Western Medicine • … Š KDnuggets 2013 4
  • 5. Ignaz Semmelweis – early data scientist (1818-1865) Graph from Wikipedia Semmelweis found that the main difference between clinics was that 1st had medical students who also examined cadavers, and inferred that students carried something on their hands from the autopsy. He proposed washing Š KDnuggets 2013 hands after autopsy but was rejected and died in insane asylum 5
  • 6. Data Science Application: Process, not one step CRISP-DM process Š KDnuggets 2013 6
  • 7. Data Science Application: Process, not one step CRISP-DM process Building Predictive Models Š KDnuggets 2013 Most fun for data scientists, But only a small part of the process 7
  • 8. Data Science Basic Principles & Ideas • Focus on actionable patterns • Build predictive models - supervised learning (train, test, x-validate) • Avoid overfitting • Calculating similarity of objects - unsupervised learning • Avoid information leakers • Select important variables/features • Model accuracy vs lift: how much more prevalent a pattern is than would be expected by chance • Estimate probability and cost/gain of actions • Help optimize decisions Š KDnuggets 2013 8
  • 9. What Changes in Data Science with Big Data? • Data munging becomes much more complex • New algorithms, technology needed to deal with Big Data Volume, Velocity, & Variety • New, effective algorithms that require Big Data: e.g.: deep belief networks, recommendations • Predictions become (somewhat ) more accurate • New things become visible: social networks, recommendations, mobility, knowledg e? • However, basic principles remain Š KDnuggets 2013 9