SlideShare a Scribd company logo
1 of 39
Galit Shmuéli
       Ij       Israel Statistical Association
                    & Tel Aviv University
                         July 9, 2012


 To Explain or To Predict?
Points for discussion: goo.gl/gcjlN

Twitter: #explainpredict
Road Map
Definitions
Explanatory-dominated social sciences
Explanatory ≠ predictive modeling
 Why?
 Different modeling paths
 Explanatory vs. predictive power


So what?
Definitions

Explanatory modeling:
Theory-based, statistical testing of
causal hypotheses

Explanatory power:
Strength of relationship in statistical
model
Definitions

Predictive modeling:
Empirical method for predicting new
observations

Predictive power:
Ability to accurately predict new
observations
Statistical modeling in
     social science research



Purpose: test causal theory (“explain”)
           Association-based statistical models
                         Prediction nearly absent
Explanatory modeling à-la social sciences
Start with a causal
theory

Generate causal
hypotheses on
constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions
In the social sciences,

data analysis is mainly used for testing
            causal theory.

     “If it explains, it predicts”
“Empirical prediction alone
            is un-scientific”

Some statisticians share this view:

   The two goals in analyzing data... I prefer to describe
   as “management” and “science”. Management seeks
   profit... Science seeks truth.

                        - Parzen, Statistical Science 2001
52 “predictive” articles among 1,072
in Information Systems top journals
Why Predict? for Scientific Research
          new theory
          develop measures
          compare theories
          improve theory
          assess relevance
          predictability

Shmueli & Koppius, “Predictive Analytics in IS Research”
(MISQ, 2011)
“A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
Philosophy of Science
“Explanation and prediction have the
same logical structure”
                Hempel & Oppenheim, 1948

  “It becomes pertinent to investigate the
  possibilities of predictive procedures
  autonomous of those used for explanation”
                             Helmer & Rescher, 1959

         “Theories of social and human behavior
         address themselves to two distinct goals of
         science: (1) prediction and (2) understanding”
                                Dubin, Theory Building, 1969
Why statistical

explanatory modeling
       differs from

predictive modeling
   Shmueli (2010), Statistical Science
Theory vs. its manifestation




                     ?
Notation

Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects                 Y=F(X)
                             E(Y)=f(X)
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
“The goal of finding models that are
predictively accurate differs from the
goal of finding models that are true.”
Point #1
Best explanatory model


              ≠
       Best predictive model
Four aspects                 Y=F(X)
                             Y=f(X)
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Predict ≠ Explain
               “we tried to benefit from an extensive
               set of attributes describing each of the
               movies in the dataset. Those attributes
               certainly carry a significant signal and
                +
               can explain some of the user behavior.
               However… they could not help at all
                                                       ?
               for improving the [predictive]
               accuracy.”
                                         Bell et al., 2008
Predict ≠ Explain
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%




“We are planning to… develop predictive models for bioavailability
and bioequivalence”
                                           Lester M. Crawford, 2005
                                Acting Commissioner of Food & Drugs
Goal        Design &          Data          EDA
Definition    Collection     Preparation




Variables?                                 Model Use &
Methods?     Evaluation, V                  Reporting
             alidation &
                 Model
               Selection
Study design
    & data collection
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?

                             Hierarchical data
Data Preprocessing




   missing    reduced-
               feature
               models
                         partitioning
Data exploration & reduction

                   Interactive
                  visualization
                      PCA
                      SVD
Which Variables?



                        endogeneity
                          ex-post
                         availability
causation associations
  Multicollinearity? A, B, A*B?
Methods / Models
                    Blackbox / interpretable
                    Mapping to theory


   variance                       bias




Shrinkage models
           ensembles
Model fit ≠
              Validation
                                 Explanatory power

Theoretical                Empirical
                                              Data
  model                     model

        Evaluation, Validation
          & Model Selection

Empirical                  Training data      Over-fitting
 model                     Holdout data        analysis
         Predictive power
Model Use
 test causal theory         Inference
                             Null hypothesis


new theory
Develop measures
compare theories      Predictive performance
improve theory         Naïve/baseline
assess relevance      Over-fitting analysis
predictability
Point #2

Explanatory            Predictive
  Power         ≠        Power

Cannot infer one from the other
out-of-sample
 interpretation

p-values                        prediction
                                 accuracy
               Performance
R2                                      costs
                 Metrics
                             Training vs.
goodness-of-fit              holdout
     type I,II errors    over-fitting
Predictive Power




                   Explanatory Power
The predictive power of an
explanatory model has important
scientific value


Relevance, reality check, predictability
In “explanatory” fields
Prediction underappreciated

Distinction blurred
Unfamiliar with predictive
modeling/assessment
  “While the value of scientific prediction… is beyond
  question… the inexact sciences *do not+ have…the
  use of predictive expertise well in hand.”
                               Helmer & Rescher, 1959
How does all this impact
   Scientific Research?
What can be done?

   acknowledge
incorporate prediction into
         curriculum
What happens in other fields?

     Epidemiology
         Engineering
             Life sciences

What about “predictive only”
fields?           http://goo.gl/gcjlN
Shmueli (2010), “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ

More Related Content

What's hot (20)

HYPOTHESIS
HYPOTHESISHYPOTHESIS
HYPOTHESIS
 
Lecture2
Lecture2Lecture2
Lecture2
 
Research Methods: Scientific Thinking
Research Methods: Scientific ThinkingResearch Methods: Scientific Thinking
Research Methods: Scientific Thinking
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
A Summary on "Using Thematic Analysis in Psychology"
A Summary on "Using Thematic Analysis in Psychology"A Summary on "Using Thematic Analysis in Psychology"
A Summary on "Using Thematic Analysis in Psychology"
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
Business Research Methods
Business Research MethodsBusiness Research Methods
Business Research Methods
 
Research developing theoretical and conceptual frameworks
Research  developing theoretical and conceptual frameworksResearch  developing theoretical and conceptual frameworks
Research developing theoretical and conceptual frameworks
 
REDUNDANT PUBLICATION IN RESEARCH
REDUNDANT PUBLICATION IN RESEARCHREDUNDANT PUBLICATION IN RESEARCH
REDUNDANT PUBLICATION IN RESEARCH
 
Theory building lecture-3
Theory building lecture-3Theory building lecture-3
Theory building lecture-3
 
Research ethics
Research ethicsResearch ethics
Research ethics
 
Ethical issues in research
Ethical issues in researchEthical issues in research
Ethical issues in research
 
Research paradigm
Research paradigmResearch paradigm
Research paradigm
 
Formulating hypothesis
Formulating hypothesisFormulating hypothesis
Formulating hypothesis
 
Research and Theory
Research and TheoryResearch and Theory
Research and Theory
 
Selective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptxSelective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptx
 
Research methodology Chapter 5
Research methodology Chapter 5Research methodology Chapter 5
Research methodology Chapter 5
 
5. sampling design
5. sampling design5. sampling design
5. sampling design
 
Moral judgement
Moral judgementMoral judgement
Moral judgement
 

Viewers also liked

Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Galit Shmueli
 
What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?Galit Shmueli
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning modelsandosa
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?Galit Shmueli
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP Technology
 
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...Jean-Antoine Moreau
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP Technology
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-Koichiro Gibo
 
Research report purposes and classifications
Research report purposes and classificationsResearch report purposes and classifications
Research report purposes and classificationsAnn Vitug
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodologysh_neha252
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Elsevier
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab Babasab Patil
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsAhmed-Refat Refat
 

Viewers also liked (20)

Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
 
What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning models
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis Library
 
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-
 
Explanation Slides
Explanation SlidesExplanation Slides
Explanation Slides
 
Research report purposes and classifications
Research report purposes and classificationsResearch report purposes and classifications
Research report purposes and classifications
 
Employee Engagement
Employee EngagementEmployee Engagement
Employee Engagement
 
Employee engagement
Employee engagementEmployee engagement
Employee engagement
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
 
Research methodology notes
Research methodology notesResearch methodology notes
Research methodology notes
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and Methods
 

Similar to To explain or to predict

To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxtheodorelove43763
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
2 types of research
2 types of research2 types of research
2 types of researchNaveed Saeed
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of researchNaveed Saeed
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertationChuck Eesley
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business ResearchRajesh Timane, PhD
 

Similar to To explain or to predict (20)

To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 
1.model building
1.model building1.model building
1.model building
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
2 types of research
2 types of research2 types of research
2 types of research
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of research
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertation
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Mgmt 802 week 1(1)
Mgmt 802 week 1(1)
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business Research
 

More from Galit Shmueli

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiGalit Shmueli
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesGalit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 

More from Galit Shmueli (20)

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

To explain or to predict

  • 1. Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict?
  • 2. Points for discussion: goo.gl/gcjlN Twitter: #explainpredict
  • 3. Road Map Definitions Explanatory-dominated social sciences Explanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive power So what?
  • 4. Definitions Explanatory modeling: Theory-based, statistical testing of causal hypotheses Explanatory power: Strength of relationship in statistical model
  • 5. Definitions Predictive modeling: Empirical method for predicting new observations Predictive power: Ability to accurately predict new observations
  • 6. Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  • 7. Explanatory modeling à-la social sciences Start with a causal theory Generate causal hypotheses on constructs Operationalize constructs → Measurable variables Fit statistical model Statistical inference → Causal conclusions
  • 8. In the social sciences, data analysis is mainly used for testing causal theory. “If it explains, it predicts”
  • 9. “Empirical prediction alone is un-scientific” Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
  • 10. 52 “predictive” articles among 1,072 in Information Systems top journals
  • 11. Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictability Shmueli & Koppius, “Predictive Analytics in IS Research” (MISQ, 2011)
  • 12. “A good explanatory model will also predict well” “You must understand the underlying causes in order to predict”
  • 13. Philosophy of Science “Explanation and prediction have the same logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  • 14. Why statistical explanatory modeling differs from predictive modeling Shmueli (2010), Statistical Science
  • 15. Theory vs. its manifestation ?
  • 16. Notation Theoretical constructs: X, Y Causal theoretical model: Y=F(X) Measurable variables: X, Y Statistical model: E(y)=f(X)
  • 17. Four aspects Y=F(X) E(Y)=f(X) 1. Theory – Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 18. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 19. Point #1 Best explanatory model ≠ Best predictive model
  • 20. Four aspects Y=F(X) Y=f(X) 1. Theory - Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 21. Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
  • 22. Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to… develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  • 23. Goal Design & Data EDA Definition Collection Preparation Variables? Model Use & Methods? Evaluation, V Reporting alidation & Model Selection
  • 24. Study design & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy) How much data? How to sample? Hierarchical data
  • 25. Data Preprocessing missing reduced- feature models partitioning
  • 26. Data exploration & reduction Interactive visualization PCA SVD
  • 27. Which Variables? endogeneity ex-post availability causation associations Multicollinearity? A, B, A*B?
  • 28. Methods / Models Blackbox / interpretable Mapping to theory variance bias Shrinkage models ensembles
  • 29. Model fit ≠ Validation Explanatory power Theoretical Empirical Data model model Evaluation, Validation & Model Selection Empirical Training data Over-fitting model Holdout data analysis Predictive power
  • 30. Model Use test causal theory Inference Null hypothesis new theory Develop measures compare theories Predictive performance improve theory Naïve/baseline assess relevance Over-fitting analysis predictability
  • 31. Point #2 Explanatory Predictive Power ≠ Power Cannot infer one from the other
  • 32. out-of-sample interpretation p-values prediction accuracy Performance R2 costs Metrics Training vs. goodness-of-fit holdout type I,II errors over-fitting
  • 33. Predictive Power Explanatory Power
  • 34. The predictive power of an explanatory model has important scientific value Relevance, reality check, predictability
  • 35. In “explanatory” fields Prediction underappreciated Distinction blurred Unfamiliar with predictive modeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences *do not+ have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  • 36. How does all this impact Scientific Research?
  • 37. What can be done? acknowledge incorporate prediction into curriculum
  • 38. What happens in other fields? Epidemiology Engineering Life sciences What about “predictive only” fields? http://goo.gl/gcjlN
  • 39. Shmueli (2010), “To Explain or To Predict?”, Statistical Science Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ