Week 1
Intro & Stats I Review
Applied Statistical Analysis II
Jeffrey Ziegler, PhD
Assistant Professor in Political Science & Data Science
Trinity College Dublin
Spring 2023
Road map for today
Welcome and introduction
I About module, structure
I Review of last term
I Bridging terms
Alternative derivations of least squares
Frequentist model of likelihood
Intro to GLMs & MLE
Next time: Framework of generalised linear models (GLMs)
By next week, please...
I Fork GitHub repository
I Read assigned chapters
1 53
General Info About Course
Instructor Jeffrey Ziegler, PhD
Email zieglerj@tcd.ie
In-Person Sessions 16:00 - 18:00 Mondays AP2.05
Office Hours W/Th 13:00-14:00
2 53
Review of Tools: Necessity of R
R is the statistical programming language
Perform stats analysis, data manipulation, plotting, etc.
Notes on R code:
I ALWAYS comment your code
Worry more about saying too little rather than too much
I Use indentation to visually clarify blocks of code, such as
multiple lines for one command or multiple commands that
produce one logical step
I Plan for your code to be run from source files
Interactive analysis is great for exploration, terrible for
(re)analysis, which your job and/or reputation may depend
I Create command source files with all your analysis (.R file)
I Final plots go to files, not screen (dev.off())
3 53
Review of Tools: GitHub, & LaTex
LaTex is the word processor
Input output, figures, and code from R
Using Word will result in a deduction
GitHub is how we’ll share our work with each other
Fork repository
Keep up-to-date
Keep organised
4 53
As a reminder, you can access the syllabus here...
Direct link to syllabus, or find on course website
5 53
Required Materials
Texts All readings are provided, don’t buy books!
R and LaTeX Should have installed
Rstudio and TexStudio Should have installed
6 53
Problem Sets (50%)
Typically assigned every other week(ish), and you will
generally have two weeks to do assignment (this will vary)
Problem sets require R and should be written in LaTex (must
include tables, figures, and code within text)
All problem sets will be posted on GitHub
Evaluated by me or Martyn
I will publish correct answers each week, but look at others’
GitHubs so we can learn from each other
The lowest PS grade will be dropped, so “I have been so busy
with other classes” is not a legitimate reason to not turn in
problem set!
8 53
Exams (50%)
In class: February, 27
Exam is cumulative
Exam is multiple choice and open-response questions
Exam is graded by me, instructor
You will be allowed a formula sheet
Make-up exams are only allowed by written approval
9 53
Replication (25%): Example
Does Having Daughters Cause Judges to Rule for Women’s Issues?
Table: Number of Children and Girls for U.S. Courts of Appeals Judges
Participating in Gender-Related Cases, 1996-2002
1
Number of Children 0 1 2 3 4 5 6 7 8 9
Democrat 12 13 33 24 15 4 0 1 0 1
Republican 13 8 44 30 15 7 3 0 1 0
Number of girls 0 1 2 3 4 5 6 7 8 9
Democrat 26 35 29 10 1 2 - - - -
Republican 36 43 31 9 2 0 - - - -
1
Does Having Daughters Cause Judges to Rule for Women’s Issues? Adam
Glynn & Maya Sen (AJPS, 2015)
10 53
Data: Daughters
Number of Girls by Partisan Leaning
Quantity
Conditional
Percent
0 1 2 3 4 5
1
0
0.0
0.2
0.4
0.6
0.8
1.0
Democrat
Republican
11 53
Data: Judge Demography
Table: Demographics of U.S. Court of Appeal Judges who voted on
gender-related cases (1996-2002)
All Democrats Republicans Women Men
Mean No. Children 2.47 2.40 2.54 1.58 2.66
Mean No. Girls 1.24 1.33 1.16 0.71 1.34
0 children 0.11 0.12 0.11 0.29 0.08
1 children 0.09 0.13 0.07 0.21 0.07
2 children 0.34 0.32 0.36 0.26 0.36
3 children 0.24 0.23 0.25 0.13 0.26
4 children 0.13 0.15 0.12 0.08 0.15
5 Children or More 0.08 0.06 0.09 0.03 0.05
Proportion Female 0.17 0.26 0.09 1.00 0.00
Proportion Republican 0.54 0.00 1.00 0.29 0.59
Proportion White 0.91 0.78 0.99 0.93 0.91
Mean Year Born 1932.55 1931.23 1933.43 1938.57 1931.49
12 53
Data: Judge Demography
Data: Judge Demography
0.00
0.25
0.50
0.75
1.00
Democrat Republican
Percent
of
Party
Percent of Female Judges by Party Affiliation
13 53
Data: Cases
Table: Distribution of the number of gender-related cases heard per
judge, 1996-2002
Min. 1st Qu. Median Mean 3rd Qu. Max.
All Judges 1.00 5.00 8.00 11.10 14.00 46.00
Democrats 1.00 5.00 7.00 10.12 13.00 39.00
Republicans 1.00 5.00 9.00 11.94 14.00 46.00
14 53
Data: Cases
0.0 0.5 1.0
Proportion of Cases Decided in a Feminist Direction
Less Feminist More Feminist
Republicans
Democrats
All
15 53
Model
Predict the probability that a judge will vote in a feminist
direction in any given gender-related case
Pr(yi = 1) = logit−1
(β0 + βkXi)
yi: judge-level votes in individual cases
Xi: vector of individual-level predictors
Main covariate of interest is # of biological daughters,
conditioned on total # of children (categorical variable)
16 53
Model: Base comparison
1 # R code presented in AJPS online replication f i l e s
2 base_model <− z e l i g ( progressive . vote ~ as . factor ( g i r l s )
3 + as . factor ( child ) , model = " l o g i t " ,
4 data = subset (women. cases , child < 5 &
child > 0) )
5 summary( base_model ) # number of observations = 1974
Dependent variable:
progressive.vote
as.factor(girls)1 0.384∗∗∗
(0.128)
Observations 1,974
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
17 53
Model: Casewise delete
1 ## case −wise delete to get same n in primary model ##
2 # subset for judges with at least 1 child ,
3 # but less than 5 (n=2448)
4 women. subset <− women. cases [women. cases$ child < 5
5 & women. cases$ child > 0 ,]
6 # subset for those judges who have g i r l s (n=1975)
7 women. subset <− women. subset [ complete . cases (women. subset$
g i r l s ) , ]
8 # subset for with progressive . vote value ( minus NAs )
9 # (n=1974)
10 women. subset <−women. subset [ complete . cases (
11 women. subset$progressive . vote ) , ]
18 53
Model: Casewise delete (same results)
1 # re −run with subsetted data
2 case_delete <− glm ( progressive . vote ~ as . factor ( g i r l s )
3 + as . factor ( child ) , family =binomial ( link =
" l o g i t " ) ,
4 data = women. subset )
5 summary( case_delete ) # number of observations = 1974
Table: Re-run with subset
Dependent variable:
progressive.vote
as.factor(girls)1 0.384∗∗∗
(0.128)
Observations 1,974
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
19 53
Diagnostics: Basic model
1 ### check Pearson residuals ###
2 sum( residuals ( base_model , type= " pearson " ) ^2)
3 # check deviance
4 pchisq ( deviance ( base_model ) , df . residual ( base_model ) ,
5 lower=F )
6 # resulting p−value = 0 , not good model f i t
20 53
Diagnostics: Basic model
1 # get preds for a l l subjects and take inverse l o g i t
2 predValues <− i l o g i t ( predict ( base_model ) )
3 # mean l i k e of judge voting for p l a i n t i f f
4 mean( predValues ) # ( original mean l i b vote = 0.433)
5 # create binary outcome of benign and malignant
6 predBinary <− i f e l s e ( predValues > 0.433 , 1 , 0)
7 # create table to show predictive error
8 table ( observeBinary , predBinary )
9 #1109/1974 (56.2%) accurately predicted
Table: Estimation of judge below or above mean lib vote
0 1
0 858 333
1 532 251
21 53
Other course policies to consider
Absences for religious holidays are excused
Talk to me ASAP if you have any illness or family emergencies
All students with special accommodations should notify me
as soon as possible
I Documentation from the Trinity Office of Disability Services is
required
The schedule posted on the syllabus is tentative and subject
to change
24 53
Reminder: Approach Toward Learning
Preparation + synthesis + practice = learning
Individual preparedness: Reading and review lectures before
class
In class:
I Discussion and Q&A on important concepts
I Tutorial: Advanced theoretical problems
Office hours: Review and correct mistakes
Problem sets: Individual homework assignments
Exam and replication: Showcase knowledge
25 53
This term: Extending Modelling & Estimation
What is a model?
How do we estimate its parameters?
What are the properties of estimator?
Use what we learned, extend to non-continuous outcomes
27 53
Social Science and Parametric Models
Goal of social science is parsimonious explanation of social
phenomena
I Parsimonious because we can never explain every detail
I Explanation because we want more than mere description
Compare with non-parametric approach
28 53
Non-parametric regression smoother
2021 2022
0.26
0.28
0.30
0.32
2021−2022 Sinn Fein polls
with a lowess nonparametric fit
Date
Percent
of
Respondents
that
Support
Party
29 53
How did I do that in R?
1 # load data
2 polling _data <− read . csv ( " https : //raw . githubusercontent . com/ASDS−
TCD/ S t a t s I I _Spring2023/main/datasets/long_ I P I . csv " )
3 attach ( polling _data )
4 Date <− as . Date ( Date , "%d/%
m/%Y" )
5 # open up non−parametric plot
6 pdf ( " . . /graphics/nonparameter_example . pdf " , width =9.25)
7 plot ( Date , SF , type= "n" ,
8 main=" 2021−2022 Sinn Fein polls n with a lowess nonparametric
f i t " ,
9 xlab=" Date " , ylab=" Percent of Respondents that Support Party " )
10 points ( Date , j i t t e r ( SF ) , pch =1 , cex =.6 , col =" red " )
11 lines ( lowess ( SF ~ Date , f =1/ 10) , col = " blue " )
12 abline (0 ,0)
13 dev . off ( )
14 # Open an empty plot : type ="n" suppresses points and lines but
15 # scales axes correctly for the x and y variables .
16 # The "n" in the main t i t l e i s the line −break command to s p l i t
17 # the t i t l e across two lines .
18 # Plot the points , adding a l i t t l e noise to reduce overprinting of
# data , use plot character 1 ( open c i r c l e ) , set the size
19 # to .6 of normal and color the points red
30 53
Non-parametric Models: Virtues and Vices
Benefits:
Very flexible, can fit any pattern of data
Makes minimal (virtually no) assumptions about data
Can reveal unexpected patterns and departures from linear
assumptions
Drawbacks:
Too flexible, sensitive to overfitting
Without parameters there is no simple interpretation of
effects
Hard to incorporate substantive theory and tests
31 53
A non-parametric future?
A great deal of research on modern nonparametric methods
is going on, lots of new developments
But for social scientists, perhaps not the wave of the future
The reason is that parametric models can do a lot for us
32 53
What is a parametric model?
We begin with specification of a specific distribution
describing behaviour under study
Specification requires theoretical understanding
Specification also requires making assumptions explicit
While this places a considerable burden on our theory, it
forces us to confront limits of our knowledge and helps
avoid making implicit and unwarranted assumptions
Specification should make our assumptions clear to all,
including ourselves
33 53
Specification
We are concerned with the estimation of parametric models of
the form:
yi ∼ f(θ, xi)
where:
θ is a vector of parameters
xi is a vector of exogenous characteristics ofith observation
The specific functional form, f, provides an almost unlimited
choice of specific models
34 53
Examples of specific models
Poisson:
yi ∼ f(k; λ) = Pr(X=k) =
λke−λ
k!
where:
k is # of occurrences (k = 0, 1, 2, . . . )
e is Euler’s number (e = 2.71828...)
! is the factorial function
λ: Positive real number λ is equal to the expected value of X
and also to its variance
λ = E(X) = Var(X)
35 53
Estimation: Maximum Likelihood
Likelihood: proportional to probability of observing data,
treating parameters of distribution as variables and data as
fixed (and assuming independent observations)
L(θ|Y) ∝
N
Y
i=1
p(Yi|θ)
Maximum likelihood estimate is that value of parameter θ
for which likelihood of observed sample is a maximum
Alternatively, ML estimate is mode of likelihood function
ML estimator turns out to have several useful properties, as
we shall see
36 53
We aren’t doing Bayesian inference!
Big difference to what we’re doing with MLE!
Given that y ∼ p(y, θ) how can we make inferences about
value of θ?
Bayes approach is reverse of probability problem: given θ
what can we say about distribution of y
I Sometimes called inverse probability
Instead, we seek distribution p(θ|y), distribution of unknown
parameter conditional on observed data
37 53
Minimising least squares, assumptions?
We required no assumptions about the distribution of y, x or
u in order to compute least squares coefficients
I If all we care about is fitting data, then we can stop here
If we want to make inferences about θ, however, we need
some more assumptions
For example, what is relationship between θ̂ and θ in the
population model?
To this point, none whatsoever!
I If we want to talk about θ, as opposed to θ̂, we need some
more assumptions
38 53
Reminder: Gauss-Markov Assumptions
1. yi = xiθ + ui
2. x is fixed and full rank (linear independence)
3. E(ui) = E(ū) = µu = 0
4. E(u2
i
) = σ2
5. E(ui, uj) = 0, ∀i 6= j
6. u ∼ normal
39 53
Why make G-M assumptions?
In order to do inference, we must say how data are
generated (Assumptions 1 and 6)
Must specify the parameterization of data generating
process (Assumptions 1, 3–6)
Must prove that the estimator has desirable properties. (1 is
crucial while 3–6 are necessary for hypothesis testing
40 53
What to notice about OLS
Most of assumptions are about inherently unobservable
term, ui
Only assumption explicitly about yi is the first
Specification strongly encourages us to think of ui as “error”,
rather than intrinsic variability in outcomes, yi
Key idea: Minimize sum of squared errors
I This only indirectly considers the data generating process
that creates the observed yi
Properties of LS estimators come as an after-thought
I We must derive them for each case as assumptions differ
(think GLS vs. OLS, for example)
41 53
New way of thinking: ML models
Specification of distribution of outcome variable, this shift in
focus is conceptual, but powerful
ML requires an explicit choice of distribution
I While some may be ruled out easily, final choice is inherently
subjective and uncertain
In defending our choices, we are forced to think through
nature of data generating process, which is at core of our
substantive theory
42 53
Outcome Variable
For least squares application, we wrote
yi = xiθ + ui
and
u ∼ N(0, σ2
I)
but we never said anything about distribution of yi!
Seems odd, we have more substantive knowledge about yi than
we can possibly have about unobservable ui
43 53
Outcome Variable
Implicitly, however, is that we have said something about
distribution of yi
Because ui is normally distributed, and because yi is a linear
combination of ui, (and xθ, and since x is fixed) we can
conclude that yi is also normally distributed
Recall a theorem from intro stats: If u ∼ N(µ, σ2) and a, b are
constants, then a linear function of u
v = a + bu
is normal also
v ∼ N(µ + a, b2
σ2
)
44 53
Outcome Variable
We could write the usual linear model as
yi ∼ N(µi, σ2
)
and
µi = xiθ
Now notice that least squares model and this model are
exactly same thing
Hence we can express usual OLS model as an equivalent ML
model by focusing on distribution of yi and data generating
process, rather than on minimizing squared error
45 53
Specifying Data Generating Process
Specification of distribution of y is most crucial and
controversial step in maximum likelihood modeling
First, it is a decision which is to some extent subjective
Second, it matters a lot for results (especially predicted
values)
46 53
Ex: Parliamentary Committees
A reasonable question to ask about parliamentary
committees is how ’productive’ they are
# of bills voted out of a congressional committee gives some
hints as to an appropriate distribution
Do committees vary in how much legislation they process?
I How do we model this?
47 53
Ex: Parliamentary Committees
# of bills is discrete and non-negative
I This means we can rule out any distribution which is either
continuous or which allows negative values
Are there any systematic features that account for this
variation?
How do historical changes in committee rules or structure
affect productivity within and between committees?
Thus, normal distribution cannot be a candidate for
describing this process, since normal is defined over real
number line from −∞ to +∞
48 53
Ex: Parliamentary Committees
Binomial, for example might be one candidate
The committee considers N bills each session
From these it reports y bills out
If probability of reporting each bill is p, then probability
model is
yi ∼
n
k
pk
(1 − p)n−k
where
I k successes
I n independent Bernoulli trials
I n
k
= n!
k!(n−k)!
49 53
Ex: Parliamentary Committees
But are bills really limited to N?
If we think supply of bills is effectively unlimited, because
MEPs will find bills to sponsor if there is slack in system,
then we might wish to model the process as... a poisson
distribution
yi ∼
λke−λ
k!
This distribution is also discrete and non-negative
50 53
Issues with MLE
Choice of a particular distribution is not always clear
I Yet that choice must be made, for without it there is no model
Some criticize ML for this
I They point to the subjective and somewhat arbitrary choice
of distribution, and to fact that if you pick wrong distribution
you are estimating a misspecified model
This is a lot of assumptions, and perhaps social science
theory is not up to challenge
I This is a valid concern
I In an ideal world, we would have better knowledge of
appropriate distribution and would not have so much
discretion
51 53
Necessary Choices in Applied Stats
Don’t delude ourselves into thinking that there’s an escape
from these dilemmas of statistical modeling
Any statistical model must specify both structure and
distribution of its variables
Those who rely on OLS then are actually doing ML but are
assuming that every model is a continuous, normal, model
Surely it is preferable to adopt most persuasive ML
specification even if it is subjective, than to always adopt
this particular ML regression model regardless of substance
of problem!
52 53
Class business
Read required (and suggested) online materials
Fork GitHub repository
These slides are available on the course website
Next time, we’ll talk about GLMs!
53 / 53