SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Galit Shmuéli
       Ij       Israel Statistical Association
                    & Tel Aviv University
                         July 9, 2012


 To Explain or To Predict?
      ?‫ŚœŚ”ŚĄŚ‘Ś™Śš ŚŚ• ŚœŚ Ś‘Śâ€Ź
Points for discussion: goo.gl/gcjlN

Twitter: #explainpredict
Road Map
Definitions
Explanatory-dominated social sciences
Explanatory ≠ predictive modeling
 Why?
 Different modeling paths
 Explanatory vs. predictive power


So what?
Definitions

Explanatory modeling:
Theory-based, statistical testing of
causal hypotheses

Explanatory power:
Strength of relationship in statistical
model
Definitions

Predictive modeling:
Empirical method for predicting new
observations

Predictive power:
Ability to accurately predict new
observations
Statistical modeling in
     social science research



Purpose: test causal theory (“explain”)
           Association-based statistical models
                         Prediction nearly absent
Explanatory modeling Ă -la social sciences
Start with a causal
theory

Generate causal
hypotheses on
constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions
In the social sciences,

data analysis is mainly used for testing
            causal theory.

     “If it explains, it predicts”
“Empirical prediction alone
            is un-scientific”

Some statisticians share this view:

   The two goals in analyzing data... I prefer to describe
   as “management” and “science”. Management seeks
   profit... Science seeks truth.

                        - Parzen, Statistical Science 2001
52 “predictive” articles among 1,072
in Information Systems top journals
Why Predict? for Scientific Research
          new theory
          develop measures
          compare theories
          improve theory
          assess relevance
          predictability

Shmueli & Koppius, “Predictive Analytics in IS Research”
(MISQ, 2011)
“A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
Philosophy of Science
“Explanation and prediction have the
same logical structure”
                Hempel & Oppenheim, 1948

  “It becomes pertinent to investigate the
  possibilities of predictive procedures
  autonomous of those used for explanation”
                             Helmer & Rescher, 1959

         “Theories of social and human behavior
         address themselves to two distinct goals of
         science: (1) prediction and (2) understanding”
                                Dubin, Theory Building, 1969
Why statistical

explanatory modeling
       differs from

predictive modeling
   Shmueli (2010), Statistical Science
Theory vs. its manifestation




                     ?
Notation

Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects                 Y=F(X)
                             E(Y)=f(X)
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
“The goal of finding models that are
predictively accurate differs from the
goal of finding models that are true.”
Point #1
Best explanatory model


              ≠
       Best predictive model
Four aspects                 Y=F(X)
                             Y=f(X)
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Predict ≠ Explain
               “we tried to benefit from an extensive
               set of attributes describing each of the
               movies in the dataset. Those attributes
               certainly carry a significant signal and
                +
               can explain some of the user behavior.
               However
 they could not help at all
                                                       ?
               for improving the [predictive]
               accuracy.”
                                         Bell et al., 2008
Predict ≠ Explain
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%




“We are planning to
 develop predictive models for bioavailability
and bioequivalence”
                                           Lester M. Crawford, 2005
                                Acting Commissioner of Food & Drugs
Goal       Design &         Data          EDA
Definition   Collection    Preparation




Variables?                               Model Use &
Methods?     Evaluation,                  Reporting
             Validation
              & Model
              Selection
Study design
    & data collection
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?

                             Hierarchical data
Data Preprocessing




   missing    reduced-
               feature
               models
                         partitioning
Data exploration & reduction

                   Interactive
                  visualization
                      PCA
                      SVD
Which Variables?



                        endogeneity
                          ex-post
                         availability
causation associations
  Multicollinearity? A, B, A*B?
Methods / Models
                    Blackbox / interpretable
                    Mapping to theory


   variance                       bias




Shrinkage models
           ensembles
Model fit ≠
              Validation
                                 Explanatory power

Theoretical                Empirical
                                              Data
  model                     model

        Evaluation, Validation
          & Model Selection

Empirical                  Training data      Over-fitting
 model                     Holdout data        analysis
         Predictive power
Model Use
 test causal theory         Inference
                             Null hypothesis


new theory
Develop measures
compare theories      Predictive performance
improve theory         NaĂŻve/baseline
assess relevance      Over-fitting analysis
predictability
Point #2

Explanatory            Predictive
  Power         ≠        Power

Cannot infer one from the other
out-of-sample
 interpretation

p-values                        prediction
                                 accuracy
               Performance
R2                                      costs
                 Metrics
                             Training vs.
goodness-of-fit              holdout
     type I,II errors    over-fitting
Predictive Power




                   Explanatory Power
The predictive power of an
explanatory model has important
scientific value


Relevance, reality check, predictability
In “explanatory” fields
Prediction underappreciated

Distinction blurred
Unfamiliar with predictive
modeling/assessment
  “While the value of scientific prediction
 is beyond
  question
 the inexact sciences [do not] have
the
  use of predictive expertise well in hand.”
                               Helmer & Rescher, 1959
How does all this impact
   Scientific Research?
What can be done?

   acknowledge
incorporate prediction into
         curriculum
What happens in other fields?

     Epidemiology
         Engineering
             Life sciences

What about “predictive only”
fields?           http://goo.gl/gcjlN
Shmueli (2010), “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ

Weitere Àhnliche Inhalte

Was ist angesagt?

Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learningbutest
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive ModellingJMP software from SAS
 
MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingPaul Irwing
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestMatt Hansen
 
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Arif Rahman
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Matt Hansen
 
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æž
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æžCollaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æž
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æžć°çŁèł‡æ–™ç§‘ć­žćčŽæœƒ
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?Devon K. Barrow
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Matt Hansen
 
Research methodology
Research methodologyResearch methodology
Research methodologyStudsPlanet.com
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Matt Hansen
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prPOLY33
 

Was ist angesagt? (19)

Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learning
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive Modelling
 
MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. Irwing
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æž
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æžCollaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æž
Collaboration with Statistician? çŸ©é™ŁèŠ–èŠșćŒ–æ–ŒæŽąçŽąćŒèł‡æ–™ćˆ†æž
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
50134 09
50134 0950134 09
50134 09
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this pr
 

Ähnlich wie Shmueli

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxtheodorelove43763
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
2 types of research
2 types of research2 types of research
2 types of researchNaveed Saeed
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of researchNaveed Saeed
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertationChuck Eesley
 
Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Varindo Megatek
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business ResearchRajesh Timane, PhD
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingMeghana Gowda
 

Ähnlich wie Shmueli (20)

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
1.model building
1.model building1.model building
1.model building
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
2 types of research
2 types of research2 types of research
2 types of research
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of research
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertation
 
Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Mgmt 802 week 1(1)
Mgmt 802 week 1(1)
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business Research
 
The Business Research Method
The Business Research MethodThe Business Research Method
The Business Research Method
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modeling
 

KĂŒrzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

KĂŒrzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Shmueli

  • 1. Galit ShmuĂ©li Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict? ?‫ŚœŚ”ŚĄŚ‘Ś™Śš ŚŚ• ŚœŚ Ś‘Śâ€Ź
  • 2. Points for discussion: goo.gl/gcjlN Twitter: #explainpredict
  • 3. Road Map Definitions Explanatory-dominated social sciences Explanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive power So what?
  • 4. Definitions Explanatory modeling: Theory-based, statistical testing of causal hypotheses Explanatory power: Strength of relationship in statistical model
  • 5. Definitions Predictive modeling: Empirical method for predicting new observations Predictive power: Ability to accurately predict new observations
  • 6. Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  • 7. Explanatory modeling Ă -la social sciences Start with a causal theory Generate causal hypotheses on constructs Operationalize constructs → Measurable variables Fit statistical model Statistical inference → Causal conclusions
  • 8. In the social sciences, data analysis is mainly used for testing causal theory. “If it explains, it predicts”
  • 9. “Empirical prediction alone is un-scientific” Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
  • 10. 52 “predictive” articles among 1,072 in Information Systems top journals
  • 11. Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictability Shmueli & Koppius, “Predictive Analytics in IS Research” (MISQ, 2011)
  • 12. “A good explanatory model will also predict well” “You must understand the underlying causes in order to predict”
  • 13. Philosophy of Science “Explanation and prediction have the same logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  • 14. Why statistical explanatory modeling differs from predictive modeling Shmueli (2010), Statistical Science
  • 15. Theory vs. its manifestation ?
  • 16. Notation Theoretical constructs: X, Y Causal theoretical model: Y=F(X) Measurable variables: X, Y Statistical model: E(y)=f(X)
  • 17. Four aspects Y=F(X) E(Y)=f(X) 1. Theory – Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 18. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 19. Point #1 Best explanatory model ≠ Best predictive model
  • 20. Four aspects Y=F(X) Y=f(X) 1. Theory - Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 21. Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However
 they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
  • 22. Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to
 develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  • 23. Goal Design & Data EDA Definition Collection Preparation Variables? Model Use & Methods? Evaluation, Reporting Validation & Model Selection
  • 24. Study design & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy) How much data? How to sample? Hierarchical data
  • 25. Data Preprocessing missing reduced- feature models partitioning
  • 26. Data exploration & reduction Interactive visualization PCA SVD
  • 27. Which Variables? endogeneity ex-post availability causation associations Multicollinearity? A, B, A*B?
  • 28. Methods / Models Blackbox / interpretable Mapping to theory variance bias Shrinkage models ensembles
  • 29. Model fit ≠ Validation Explanatory power Theoretical Empirical Data model model Evaluation, Validation & Model Selection Empirical Training data Over-fitting model Holdout data analysis Predictive power
  • 30. Model Use test causal theory Inference Null hypothesis new theory Develop measures compare theories Predictive performance improve theory NaĂŻve/baseline assess relevance Over-fitting analysis predictability
  • 31. Point #2 Explanatory Predictive Power ≠ Power Cannot infer one from the other
  • 32. out-of-sample interpretation p-values prediction accuracy Performance R2 costs Metrics Training vs. goodness-of-fit holdout type I,II errors over-fitting
  • 33. Predictive Power Explanatory Power
  • 34. The predictive power of an explanatory model has important scientific value Relevance, reality check, predictability
  • 35. In “explanatory” fields Prediction underappreciated Distinction blurred Unfamiliar with predictive modeling/assessment “While the value of scientific prediction
 is beyond question
 the inexact sciences [do not] have
the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  • 36. How does all this impact Scientific Research?
  • 37. What can be done? acknowledge incorporate prediction into curriculum
  • 38. What happens in other fields? Epidemiology Engineering Life sciences What about “predictive only” fields? http://goo.gl/gcjlN
  • 39. Shmueli (2010), “To Explain or To Predict?”, Statistical Science Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ