6. Statistical modeling in
social science research
Purpose: test causal theory (âexplainâ)
Association-based statistical models
Prediction nearly absent
7. Explanatory modeling Ă -la social sciences
Start with a causal
theory
Generate causal
hypotheses on
constructs
Operationalize constructs â Measurable variables
Fit statistical model
Statistical inference â Causal conclusions
8. In the social sciences,
data analysis is mainly used for testing
causal theory.
âIf it explains, it predictsâ
9. âEmpirical prediction alone
is un-scientificâ
Some statisticians share this view:
The two goals in analyzing data... I prefer to describe
as âmanagementâ and âscienceâ. Management seeks
profit... Science seeks truth.
- Parzen, Statistical Science 2001
11. Why Predict? for Scientific Research
new theory
develop measures
compare theories
improve theory
assess relevance
predictability
Shmueli & Koppius, âPredictive Analytics in IS Researchâ
(MISQ, 2011)
12. âA good explanatory model will also
predict wellâ
âYou must understand the underlying
causes in order to predictâ
13. Philosophy of Science
âExplanation and prediction have the
same logical structureâ
Hempel & Oppenheim, 1948
âIt becomes pertinent to investigate the
possibilities of predictive procedures
autonomous of those used for explanationâ
Helmer & Rescher, 1959
âTheories of social and human behavior
address themselves to two distinct goals of
science: (1) prediction and (2) understandingâ
Dubin, Theory Building, 1969
20. Four aspects Y=F(X)
Y=f(X)
1. Theory - Data
2. Causation â Association
3. Retrospective â Prospective
4. Bias - Variance
21. Predict â Explain
âwe tried to benefit from an extensive
set of attributes describing each of the
movies in the dataset. Those attributes
certainly carry a significant signal and
+
can explain some of the user behavior.
However⊠they could not help at all
?
for improving the [predictive]
accuracy.â
Bell et al., 2008
22. Predict â Explain
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%
âWe are planning to⊠develop predictive models for bioavailability
and bioequivalenceâ
Lester M. Crawford, 2005
Acting Commissioner of Food & Drugs
23. Goal Design & Data EDA
Definition Collection Preparation
Variables? Model Use &
Methods? Evaluation, Reporting
Validation
& Model
Selection
24. Study design
& data collection
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?
Hierarchical data
27. Which Variables?
endogeneity
ex-post
availability
causation associations
Multicollinearity? A, B, A*B?
28. Methods / Models
Blackbox / interpretable
Mapping to theory
variance bias
Shrinkage models
ensembles
29. Model fit â
Validation
Explanatory power
Theoretical Empirical
Data
model model
Evaluation, Validation
& Model Selection
Empirical Training data Over-fitting
model Holdout data analysis
Predictive power
30. Model Use
test causal theory Inference
Null hypothesis
new theory
Develop measures
compare theories Predictive performance
improve theory NaĂŻve/baseline
assess relevance Over-fitting analysis
predictability
34. The predictive power of an
explanatory model has important
scientific value
Relevance, reality check, predictability
35. In âexplanatoryâ fields
Prediction underappreciated
Distinction blurred
Unfamiliar with predictive
modeling/assessment
âWhile the value of scientific prediction⊠is beyond
question⊠the inexact sciences [do not] haveâŠthe
use of predictive expertise well in hand.â
Helmer & Rescher, 1959