SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Data Science Isn’t a Fad
Let’s Keep It That Way


     Presentation to Research Triangle Analysts
                 February 21, 2013
                www.rtpanalysts.org
Data Science: Buyer Beware
               Forbes article: Data Science:
                 Buyer Beware “This is a
                   management fad.”

Me: I’ve been doing this for 16 years. It
  isn’t a fad. You keep renaming it.


Result: Great conversation, and another Forbes article.
Obligatory Definition
 Wikipedia: Data science is a novel term that is often
 used interchangeably with competitive intelligence or
 business analytics, although it is becoming more
 common. Data science seeks to use all available and
 relevant data to effectively tell a story that can be easily
 understood by non-practitioners.
 Sexiest job of the 21st century. --Thomas H. Davenport
 and DJ Patil
 Pseudo science performed by rock-star unicorns. --
 The Internet
Data SCIENCE
Data: emphasizes the transformation of raw
information into actionable results.
Science: emphasizes the commitment to verifiable and
repeatable process.
Data Science: The discipline of transforming raw
information into actionable results in a manner that is
verifiable and repeatable.


“Information is cheap. Meaning is expensive.”
           --George Dyson, 2011
Data Science Is....

    Google’s
  Search Engine        Fraud Framework




Spotfire Operations   Analytics in Production
    Analytics
Once upon a time...	
Information was VERY expensive.
Data Science and Statistics

 The statistical methods you learn as an undergraduate
 were optimized to make efficient use of small data
 samples.
 Data is a unique resource: The more you have, the
 more valuable each individual piece becomes.
 Provided you can extract meaning from the
 information.
“Big Data” = New Problems

Dynamic environment: relationships change.
Constant sampling means you will have false positives.
Large numbers of variables and data points means you
have to rely on automated tools.
Not all automated tools are created equal.
Cue Shameless Plug....
              John Sall
   Co-Founder & EVP of SAS Institute
           Director of JMP

     “From Big Data to Big Statistics”
           March 21, 6:30pm
           Louie and Charlies
        www.louieandcharlies.com
Raw Information to Actionable
Results


 The results of the analysis must answer the business
 question(s).
 The results of the analysis must provide a course of
 action.
Actionable


Click on this link.   Check this person’s file.



Stop/encourage this
                         Look at this pattern.
      activity.
Verifiable

 The assumptions from the underlying methods must be
 stated and shown to be true.
 Outlier cases must be documented and handled
 effectively.
   Different analysis, error table, excluded point.
Y = 3.0017 + 0.499X
                                 Corr = 0.8199


Anscombe’s Quartet
Linear regression assumes a straight line
relationship and normally distributed errors.
Y = 3.0017 + 0.499X
                                  Corr = 0.8199


Anscombe’s Quartet
This line has the same statistics as the one
before. But the relationship is not a straight line.
Y = 3.0017 + 0.499X
           Corr = 0.8199




Anscombe’s Quartet
An outlier is affecting the equation.
Y = 3.0017 + 0.499X
                                 Corr = 0.8199



Anscombe’s Quartet
One outlier drives the entire relationship.
Repeatable


When I do this again with data that meets the stated
assumptions, I should get the same answers.
Small changes in the data should NOT break the
algorithm.



          Easier said than done.
Making Results Repeatable
Automated verification of assumptions.
Good coding practices (no matter the language).
Out of sample testing.
  Do the same analysis with similar data.
Failure conditions
  Document what should happen when bad data goes
  into the algorithm.
  Run the algorithm with bad data.
This is the endpoint of the analysis.
Companies who hire data scientists use the results
to make decisions.
Repeatable: Closing the
Loop With Users
It is the data scientist’s responsibility to make sure the
results are used effectively.
Involve users at the beginning of the process.
Use iterative feedback to make sure results are:
  Actionable
  Verifiable
  Repeatable.
Why Bother?
           “Beware the Big Errors of Big Data”


  “Big Data is Falling into the
   Trough of Disillusionment”


         “If you asked me to describe the rising
          philosophy of the day, I would say it’s
                       data-sim...”
Really,Then, Why Bother?

     “...the Oakland A's' front
office ...fielded a team that could
  compete successfully against
   richer competitors in Major
     League Baseball (MLB).”
Because What We Do Matters
         “Refugees United...uses mobile and
        web technologies to help refugees find
              their missing loved ones.”
                    --datakind.org


      “Predictive analytics is saving lives and
        taxpayer dollars in New York City.”
    --Alex Howard, Michael Flowers interview
That’s Enough From Me
What do you think about me?


               mthielbar@gmail.com

          melindathielbar.wordpress.com
               info@rtpanalysts.org
                    THANK YOU!
All photos the property of their respective owners.

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Machine Learning
Introduction to Machine Learning Introduction to Machine Learning
Introduction to Machine Learning Rupak Roy
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science ProcessVishal Patel
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary SurveyTrieu Nguyen
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesEdgar Alejandro Villegas
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStat Analytica
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?InfoTrust LLC
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSMSunView Software, Inc.
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDisplayr
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125Displayr
 
Association Mining
Association Mining Association Mining
Association Mining Edureka!
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...Yusuke Kaneko
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Slides for automate or die (presentation)
Slides for automate or die (presentation)Slides for automate or die (presentation)
Slides for automate or die (presentation)Displayr
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessManojit Nandi
 
What is Data Science actually is?
What is Data Science actually is?What is Data Science actually is?
What is Data Science actually is?Rupak Roy
 

Was ist angesagt? (20)

Introduction to Machine Learning
Introduction to Machine Learning Introduction to Machine Learning
Introduction to Machine Learning
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerful
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
Association Mining
Association Mining Association Mining
Association Mining
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Slides for automate or die (presentation)
Slides for automate or die (presentation)Slides for automate or die (presentation)
Slides for automate or die (presentation)
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
 
What is Data Science actually is?
What is Data Science actually is?What is Data Science actually is?
What is Data Science actually is?
 

Andere mochten auch (10)

Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)Hardware y software (UNICATOLICA)
Hardware y software (UNICATOLICA)
 
Dragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor GraphicsDragon Week of Thanks Floor Graphics
Dragon Week of Thanks Floor Graphics
 
Ancillary task drafts
Ancillary task draftsAncillary task drafts
Ancillary task drafts
 
PRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENTPRINCIPLES OF MANAGEMENT
PRINCIPLES OF MANAGEMENT
 
Negotiation
NegotiationNegotiation
Negotiation
 
Community Heart at Trinity at Alkimos
Community Heart at Trinity at AlkimosCommunity Heart at Trinity at Alkimos
Community Heart at Trinity at Alkimos
 
the_inner_edge_n-05
the_inner_edge_n-05the_inner_edge_n-05
the_inner_edge_n-05
 
Nucleus
NucleusNucleus
Nucleus
 
...
......
...
 
Tesina Azzurra
Tesina AzzurraTesina Azzurra
Tesina Azzurra
 

Ähnlich wie Data Science Isn't a Fad: Let's Keep it That Way

DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfessionGary Rector
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013WCJones6348
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executivesDylan Erens
 
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
How to Ruin your Business with Data Science & Machine Learning by Ingo MierswaData Con LA
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroRobert Munro
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxwhittemorelucilla
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayHjk6653284
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationVisual Resources Association
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management ToolkitJack Moore
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassLean Analytics
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data scienceAshiq Rahman
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Gramener
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 

Ähnlich wie Data Science Isn't a Fad: Let's Keep it That Way (20)

DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa  How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
How to Ruin your Business with Data Science & Machine Learning by Ingo Mierswa
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert Munro
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Making data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualizationMaking data visual diy guide to getting started with data visualization
Making data visual diy guide to getting started with data visualization
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclass
 
Opportunities with data science
Opportunities with data scienceOpportunities with data science
Opportunities with data science
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Data Science Isn't a Fad: Let's Keep it That Way

  • 1. Data Science Isn’t a Fad Let’s Keep It That Way Presentation to Research Triangle Analysts February 21, 2013 www.rtpanalysts.org
  • 2. Data Science: Buyer Beware Forbes article: Data Science: Buyer Beware “This is a management fad.” Me: I’ve been doing this for 16 years. It isn’t a fad. You keep renaming it. Result: Great conversation, and another Forbes article.
  • 3. Obligatory Definition Wikipedia: Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners. Sexiest job of the 21st century. --Thomas H. Davenport and DJ Patil Pseudo science performed by rock-star unicorns. -- The Internet
  • 4. Data SCIENCE Data: emphasizes the transformation of raw information into actionable results. Science: emphasizes the commitment to verifiable and repeatable process. Data Science: The discipline of transforming raw information into actionable results in a manner that is verifiable and repeatable. “Information is cheap. Meaning is expensive.” --George Dyson, 2011
  • 5. Data Science Is.... Google’s Search Engine Fraud Framework Spotfire Operations Analytics in Production Analytics
  • 6. Once upon a time... Information was VERY expensive.
  • 7. Data Science and Statistics The statistical methods you learn as an undergraduate were optimized to make efficient use of small data samples. Data is a unique resource: The more you have, the more valuable each individual piece becomes. Provided you can extract meaning from the information.
  • 8. “Big Data” = New Problems Dynamic environment: relationships change. Constant sampling means you will have false positives. Large numbers of variables and data points means you have to rely on automated tools. Not all automated tools are created equal.
  • 9. Cue Shameless Plug.... John Sall Co-Founder & EVP of SAS Institute Director of JMP “From Big Data to Big Statistics” March 21, 6:30pm Louie and Charlies www.louieandcharlies.com
  • 10. Raw Information to Actionable Results The results of the analysis must answer the business question(s). The results of the analysis must provide a course of action.
  • 11. Actionable Click on this link. Check this person’s file. Stop/encourage this Look at this pattern. activity.
  • 12. Verifiable The assumptions from the underlying methods must be stated and shown to be true. Outlier cases must be documented and handled effectively. Different analysis, error table, excluded point.
  • 13. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet Linear regression assumes a straight line relationship and normally distributed errors.
  • 14. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet This line has the same statistics as the one before. But the relationship is not a straight line.
  • 15. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet An outlier is affecting the equation.
  • 16. Y = 3.0017 + 0.499X Corr = 0.8199 Anscombe’s Quartet One outlier drives the entire relationship.
  • 17. Repeatable When I do this again with data that meets the stated assumptions, I should get the same answers. Small changes in the data should NOT break the algorithm. Easier said than done.
  • 18. Making Results Repeatable Automated verification of assumptions. Good coding practices (no matter the language). Out of sample testing. Do the same analysis with similar data. Failure conditions Document what should happen when bad data goes into the algorithm. Run the algorithm with bad data.
  • 19. This is the endpoint of the analysis. Companies who hire data scientists use the results to make decisions.
  • 20. Repeatable: Closing the Loop With Users It is the data scientist’s responsibility to make sure the results are used effectively. Involve users at the beginning of the process. Use iterative feedback to make sure results are: Actionable Verifiable Repeatable.
  • 21. Why Bother? “Beware the Big Errors of Big Data” “Big Data is Falling into the Trough of Disillusionment” “If you asked me to describe the rising philosophy of the day, I would say it’s data-sim...”
  • 22. Really,Then, Why Bother? “...the Oakland A's' front office ...fielded a team that could compete successfully against richer competitors in Major League Baseball (MLB).”
  • 23. Because What We Do Matters “Refugees United...uses mobile and web technologies to help refugees find their missing loved ones.” --datakind.org “Predictive analytics is saving lives and taxpayer dollars in New York City.” --Alex Howard, Michael Flowers interview
  • 24. That’s Enough From Me What do you think about me? mthielbar@gmail.com melindathielbar.wordpress.com info@rtpanalysts.org THANK YOU! All photos the property of their respective owners.