Lectures prepared for sections of Political Science 699 Winter 2010, when I served as a Graduate Student Instructor (TA) for this course, taught by Rob Franzese, at the University of Michigan. The last part of the last slide was cut off and I have not fixed it.
1. PS 699 Section March 18, 2010
Megan Reif
Graduate Student Instructor, Political Science
Professor Rob Franzese
University of Michigan
Regression Diagnostics for Extreme
Values (also known as extreme value
diagnostics, influence diagnostics,
leverage diagnostics, case diagnostics)
2. Review of (often iterative) Modeling Process
• EDA helps identify obvious
violations of CLRM
• Address trade-offs between
corrections
•Numeric & Graphic, Formal &
Informal Diagnostics
•Influence
•Normality
•Collinearity
•Non-Sphericality
• Exploratory Data Analysis
(EDA) of Empirical
Distribution (center, spread,
skewness, tail length, outliers)
• Uni-& Bivariate
• Numeric & Graphic, Formal
& Informal
• Include your prior info
about population
distribution & variance
• Data-generating process
• Assumptions
THEORY
FORMULATION &
MODEL
SPECIFICATION
DATA (Measure,
Sample, Collect,
Clean)
MODEL
ESTIMATION &
INFERENCE
POST-
ESTIMATION
ANALYSIS /
CRITICAL
ASSESSMENT OF
ASSUMPTIONS
But don’t start
dropping
observations
at this stage!
But don’t start
dropping
observations
at this stage!
Treat Outliers
as INFO, not
NUISANCE,
Explain them,
don’t hide
them
Treat Outliers
as INFO, not
NUISANCE,
Explain them,
don’t hide
them
2(c) Megan Reif
3. I. Pre-Modeling Exploratory Data Analysis (EDA)
(Review/Checklist)
• Not to be confused with data-mining – Arrive at data with your theory in
hand
• Because Multivariate analysis builds on uni- and bivariate analysis, begin
with univariate analysis, followed with bivariate, before proceding.
• These notes assume knowledge of production of descriptive statistics, but
provides basic commands and output as a sort of checklist.
• Don’t forget to start by using Stata’s “describe”, “summarize”,
codebook”,and “inspect” commands to understand (a) how the
variables are labeled and coded (b) basic distributions (c) how much
missing data there are for each variable.
• To think about possible effect of missing data on your model, use “list if”
command
list yvar xvar1 xvar2 xvar3 if yvar==.
list yvar xvar1 xvar2 xvar3 if xvar1==.
and so on
• Recode and label your variables for easier interpretation before
proceeding, particularly the uniqueid variable (such as country-year,
individual 1-n, etc.) for easy labeling of points (choose a short name).
3(c) Megan Reif
4. I.A Exploratory Data Analysis
(EDA):Univariate & Bivariate Analysis
1. Summarize Basic Univariate and Bivariate Distributions for
Theoretical Model Variables for data structure:
1. Location (Mean, Median)
2. Spread (Range, Variance, Quartiles)
3. Genuine Skewness vs. Outliers
The most efficient way to obtain this information is to use
Stata’s “tabstat” command and the statistics you desire
for your model variables and then inspect:
• Histograms (do not forget to explore using different bin
sizes and between 5-20 bins, since histogram distributions
are sensitive to bin size)
• Boxplots
• Matrix Scatterplots
4(c) Megan Reif
5. Univariate Outliers
• Distinguish between GENUINE skewness in the population distribution &
subsequently the empirical distribution, as opposed to unusual behavior (outliers)
in one of the tails. Your theory about the population may guide you on this.
• Do not leave univariate outliers out of your model or model them explicitly based
on descriptive statistics until you have done post-estimation diagnostics to
determine whether they are also MULTIVARIATE outliers (or correct them if they
are due to obvious typos or missing data or non-response codes like “999”).
• A UNIVARIATE outlier is a data point which is distant from the main body of the
data (say, middle 50%). One way to measure this distance is the Inter-Quartile
range (range of the middle 50% of the data). A data point xo is an outlier if:
– OBSERVE whether the middle 50 percent of the data ALSO manifest skewness.
– If IQR is skewed, a transformation such as a log or square may be called for; IF
NOT, focus on the outliers.
– Use a Box Plot to check location of the median in relation to the quartiles.
1.5 or
1.5
A data point is a far outlier if
3.0 or
3.0
o L
o U
o L
o U
x Q IQR
x Q IQR
x Q IQR
x Q IQR
5
In Stata, a Box-Plot will show outliers
(1.5IQR criteria) as points if they are
present in the data.
(c) Megan Reif
6. Tanzania Revenue Data
tabstat rev rexp dexp t, s(mean median sd var count min max iqr)
stats | rev rexp dexp t
---------+----------------------------------------
mean | 3728.381 4030.048 1693.619 80
p50 | 3544 3891 1549 80
sd | 817.1005 821.3014 894.879 6.204837
variance | 667653.2 674535.9 800808.3 38.5
N | 21 21 21 21
min | 2549 2899 586 70
max | 5433 5627 3589 90
iqr | 926 1127 1379 10
--------------------------------------------------
6
EXAMPLE Data: Tanzania.dta (Mukjeree et al)
REV: Gov Recurrent Revenue REXP: Gov Recurrent Expenditure
DEXP: Gov Development Expenditure Year (T) 1970-1990
Decade 0=1970s, 1=1980s, 3=1990
(c) Megan Reif
8. Univariate Box Plots & Histograms
2,0003,0004,0005,0006,000
rev
graph box rev
• Notice that the inter-
quartile range manifests
skewness, in addition to the
maximum being much
further from the middle
50% of the observations
• Note how different the
histogram for Revenue
appears for 4, 6, 8, and 10
bins (21 observations)
• See histogram help file to
ensure you properly display
histograms for continuous
vs. discrete variables.
8
6
8
5
2
02468
Frequency
2500 3000 3500 4000 4500 5000
rev
5 5
4
2
3
2
012345
Frequency
2500 3000 3500 4000 4500 5000
rev
4
2
6
2 2
3
2
0246
Frequency
2000 3000 4000 5000
rev
3
2 2
5
2 2
3
2
012345
Frequency
2000 3000 4000 5000 6000
rev
Different Bin Sizes
Histogram of Tanzania Annual Revenue
(c) Megan Reif
9. graph box rev if decade ==0 |
decade==1, over(decade)
histogram rev, by(decade)
• Box Plot of Revenue by decade
(1970s and 1980s)
• Note that the IQR is less
skewed for the 1970s than the
1980s
• Since there are no dots in the
boxplot we know there are no
formal univariate outliers.
• We also know from other
financial data that skewness
may be something to correct
for with a log transformation.
9
2,0003,0004,0005,0006,000
rev
0 1
05.0e-04.001.001505.0e-04.001.0015
2000 3000 4000 5000 6000
2000 3000 4000 5000 6000
0 1
2
Density
rev
Graphs by decade
Bivariate Box Plots & Histograms: Inspecting by Subgroups or
Categorical Transformations of Continuous Variables
(c) Megan Reif
10. Scatterplot Matrices and Cross-Tabulations
• Use these prior to ever running regression to
see differences and reveal potential violations
of CLRM
Group 1
Group 2
May have same relationship
to Y on average, but something
else is going on.
y
10(c) Megan Reif
11. The four panels form “Anscombe’s Quartet”—a famous demonstration by statistician Francis Anscombe in 1973. By
creating the four plots he was able to check the assumptions of his linear regression model, and found them wanting
for three of the four data sets (all but the top left). As Epstein et al. write, “Anscombe’s point, of course, was to
underscore the importance of graphing data before analyzing it” (24).
F.J. Anscombe, 1973. “Graphs in Statistical Analysis,” American Statistician 27:17, 19-20, cited in Lee Epstein, Andrew D. Martin, and Matthew M. Schneider,
2006. “On the Effective Communication of the Results of Empirical Studies, Part I.” Paper presented at the Vanderbilt Law Review Symposium on Empirical
Legal Scholarship, February 17.
Remember that looking at
correlations alone will conceal
curvilinear relationships,
heteroskedasticity, outliers, and
distributional shape. For example,
THE DATA IN THE FOUR PLOTS HAVE
THE SAME:
1) means for both y and x variables
2) slope and intercept estimates in a
regression of y on x.
3) R2 and F values (statistics we will
come to later).
Bi-Variate Correlations/Regressions: The NEED TO GRAPH
data: Same Statistics, Different Relationships
11(c) Megan Reif
12. Scatterplot Matrices
graph matrix rev rexp dexp t, half
• Allows you to look at bivariate
relationships between your
model variables, think about
possible colinearity between
explanatory variables, non-
linearity in relationships, etc.
• Notice time trend of all three
financial variables—consider
autocorrelation
• Extreme Points: We may want to
inspect the scatterplots for rev –
dexp and rexp – dexp for
observations that seem to be
unusual given our theory that
development expenditure would
be a function of revenue (the
observations have high
development expenditure but
low revenue) 12
rev
rexp
dexp
t
2000 4000 6000
3000
4000
5000
6000
3000 4000 5000 6000
0
2000
4000
0 2000 4000
70
80
90
(c) Megan Reif
13. A Closer Look: Scatterplot with Labels
scatter dexp rev, mlabel(t)
• Note that in 1990, revenue
was middling but
development expenditures
were low. What might cause
this?
13
70
7172 73
74
75
7677
78
79
80
81
82
83
848586
87
88
89
90
01000200030004000
dexp
2000 3000 4000 5000 6000
rev
3243
34973426
3756
3409
4169
44824498
54245433
4506
4112
3603
3470
2972
2746
2623
2549
2906
3544
3928
20003000400050006000
rev
70 75 80 85 90
t
scatter rev t, mlabel(rev)
• Scatter of revenue over time suggests a trend
and possible autocorrelation.It is also curious
that 1979 and 1980 have almost identical (and
high) levels of revenue. Possible data error or
real stagnation in revenue? There was a war
between Uganda and Tanzania in 1979. Note
how inspecting the data can lead to case-
specific information that may require modeling
adjustments (e.g., war dummies). And we didn’t
know a thing about Tanzania!
(c) Megan Reif
14. Cross-Tabulations (Contingency Tables)
• Recode continuous variables into categories (see notes from March 11), which
enables you to summarize continuous variables by categories (below) and inspect
test statistics for inter-group differences in means and variances (next slide)
gen revcat=rev
recode revcat 2549/3500=1 3501/4500=2 4501/max=3
label define revcat 1 "low" 2 "med" 3 "high“
Label values revcat revcat tatistics and interpret
tab revcat decade, sum(dexp)
• We want to see if the mean, sd for development expenditure varies by revenue
level and decade, for example, in order to see if one decade is responsible for all of
the high revenue observations, etc. – remember how important sub-group size is
when using interaction terms. Cross-tabs are an important exploring whether the
same small subgroup is driving the key results of estimation. Remember the 13
educated women in the dummy model (Feb 25 notes).]
| decade
revcat | 0 1 2 | Total
-----------+---------------------------------+----------
low | 1497.25 934.16667 . | 1159.4
| 188.20977 275.87274 . | 372.3422
| 4 6 0 | 10
-----------+---------------------------------+----------
med | 2205.75 1587.6667 586 | 1771.5
| 439.05989 850.62408 0 | 782.53526
| 4 3 1 | 8
-----------+---------------------------------+----------
high | 3352 3096 . | 3266.6667
| 335.16861 0 . | 279.31046
| 2 1 0 | 3
-----------+---------------------------------+----------
Total | 2151.6 1346.4 586 | 1693.619
| 774.83134 822.12451 0 | 894.87896
| 10 10 1 | 21
14(c) Megan Reif
15. • Inspect test statistics for inter-group
differences in means and variances
• Categories of low, medium, and high
revenue levels are not statistically
significantly disproportionately
distributed in any one decade -- one
period alone will probably not be driving
statistically significant results for revenue
effects), with the caveat that our
categories need to be meaningful—
perhaps coded at natural breaks in the
data, quartiles, etc. However, outliers
that do not fall in subgroups will not
show up with this method. It is still
useful to consider possible clusters of
data that will influence our model.
tab revcat decade, column row chi2
lrchi2 V exact gamma taub
decade
revcat | 0 1 2 | Total
-----------+---------------------------------+----------
low | 4 6 0 | 10
| 40.00 60.00 0.00 | 100.00
| 40.00 60.00 0.00 | 47.62
-----------+---------------------------------+----------
med | 4 3 1 | 8
| 50.00 37.50 12.50 | 100.00
| 40.00 30.00 100.00 | 38.10
-----------+---------------------------------+----------
high | 2 1 0 | 3
| 66.67 33.33 0.00 | 100.00
| 20.00 10.00 0.00 | 14.29
-----------+---------------------------------+----------
Total | 10 10 1 | 21
| 47.62 47.62 4.76 | 100.00
| 100.00 100.00 100.00 | 100.00
Pearson chi2(4) = 2.6075 Pr = 0.625
likelihood-ratio chi2(4) = 2.8982 Pr = 0.575
Cramér's V = 0.2492
gamma = -0.2000 ASE = 0.327
Kendall's tau-b = -0.1183 ASE = 0.197
Fisher's exact = 0.645
15
Cross-Tabulations (Contingency Tables)
(c) Megan Reif
16. II. Post-Estimation Diagnostics: OLS
Estimator is a (Sensitive) Mean
• The sample mean is a least squares estimator of
the location of the center of the data, but the
mean is not a resistant estimator in that it is
sensitive to the presence of outliers in the
sample. That is, changing a small part of the data
can change the value of the estimator
substantially, leading us astray.
• This is particularly problematic if we are unsure
about the actual shape of the population
distribution from which our data are drawn.
16(c) Megan Reif
17. II. Post-Estimation Diagnostics
Extreme Points (start here, since extreme points will
affect formal testing procedures) Also called case
diagnostics, case deletion diagnostics.
• In multivariate analysis, extreme data points create
more complex problems than in univariate analysis.
– A UNIVARIATE outlier is simply a value of x
different from X (unconditionally unusual, but
may not be a REGRESSION outlier).
– An outlier in simple bivariate regression is an
observation whose dependent variable value is
UNUSUAL GIVEN the value of the independent
variable (conditionally unusual).
17(c) Megan Reif
18. II. Bivariate Regression Extreme Points
• An outlier in either X or Y that has an atypical or anomalous
X value has LEVERAGE. It affects model summary statistics
(e.g. R2, standard error), but has little effect on the
regression coefficient estimates.
• An INFLUENCE point has an unusual Y value (AND maybe an
extreme x value). It is characterized by having a noticeable
impact on the estimated regression coefficients (i.e., if
removing it from the sample would markedly change the
slope and direction of the regression line).
• A RESIDUAL OUTLIER has large VERTICAL distance of a data
point from the regression line. IMPORTANT NOTE: An outlier
in X or Y is NOT necessarily associated with a large residual,
and vice versa.
18(c) Megan Reif
21. II.A.1.b Extreme Observations in X
NOTE: These examples reveal that it is most typically observations extreme in BOTH
x AND y that have influence (second graph on these two slides) but it is not always the case.
21(c) Megan Reif
22. Summary Table: Model Effects for Outliers, Leverage, Influence
Type of
Extreme Value
Y
DIRECTION
X
DIRECTION
LEVERAGE INFLUENCE
EFFECT ON INTERCEPT/
COEFFICIENTS/
UNCERTAINTY?*
Outlier in y
(yi far from Y)
Unusual In Trend = No No Yes/No/Yes
Unusual & Unusual = Yes Yes Yes/Large/Yes
Outlier in x
(xi far from X)
In Trend & Unusual = Yes No
No/No/Yes-Tends to
Reduce Uncertainty
Unusual & Unusual = Yes Yes Yes/Large/Yes
Outlier in
Residual
Yes & Possibly
Possible
but not
necessarily
Possible
but not
necessarily
No/No/Yes
*Note that influence can refer several things: (1) effect on y-intercept; (2) on particular coefficient; (3) on all
coefficients; (4) on estimated standard errors. Thus we have a variety of procedures to evaluate influence.
22(c) Megan Reif
23. 1. OUTLIERS are not
necessarily influential
2. BUT they can be,
depending on leverage
3. Yet high LEVERAGE points
are not always influential
4. And INFLUENTIAL points
are not necessarily outliers PLOT OUTLIER LEVERAGE INFLUENCE
1 Yes No No
2 Yes Yes Yes
3 No Yes No
4 No Yes Yes
23(c) Megan Reif
24. II.A Multivariate Extreme Points
• Influence in multivariate regression results from a
particular combination of values on all variables
in the regression, not necessarily just from
unusual values on one or two of the variables,
but the concepts from the bivariate case apply.
• When there are 2 or greater explanatory
variables X, scatterplots may not reveal
multivariate outliers, which are separated from
the centroid of all the Xs but do not appear in
bivariate relations of any two of them.
24(c) Megan Reif
25. Residual Analysis: A Caution
• Recall that residuals e are just an estimate of an unobservable
vector with given distributional properties. Assessing the
appropriateness of the model for a given problem may entail
the use of the residuals in the absence of Epsilon, but since e
is by definition orthogonal to (uncorrelated with) Cov(X,e)=0
the regressors with E(e)=0, one cannot use residuals to test
these assumptions of the CLRM model.
Sample: e Population: ε
Residuals Error/Disturbance Term/Stochastic Component
Estimated
Unobserved Parameter
We Try to Estimate
Difference between these means that you are never totally confident that e is a good estimate
of ε: If you meet all assumptions of CLRM then e is an unbiased, efficient, and consistent
estimate of ε.
25(c) Megan Reif
26. II.A.1 The “Hat” Matrix (Least Squares
Projection Matrix / Fitted Value Maker)
• DeNardo calls it P (because it is the Projection matrix for the
Predictor Space/least squares projection matrix (see
http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_projection_matrix.htm for a lovely geometric
explanation)); Rob calls it N (“fitted value maker”, Cook calls V, and
Belsey H. I use H since most of the books on diagnostics seems to
use H.
• The hat matrix is
• Since and by definition the vector of
fitted values it follows that
• The individual diagonal elements h1, h2, hi,..., hn of H
can thus be related to the distance between each
xi, xj .... xk explanatory variables
and the row vector of explanatory vector means x̅
where xi is the ith row of the matrix X.
T T-
1
H = X(X X) X
T T-
1
b = (X X) X y
y = Xb
) y = Hy
)
26(c) Megan Reif
28. II.A.1 Hat Matrix, cont.
(the matrix)
(vector of diagonal elements)
which equal . This is the effect of the ith element
on its own predicted value.
In scalar form, the hat (leverage)
for
T T
T T
ii i i
i
i
y
y
-
-
1
1
H = X X X X
h x X X x
)
2
2
the ith observation (note adjustment
for number of observations, where as n
grows larger individual leverage
of any one observation diminishes), is
( )1
( )
(off-diagon
i
i
i
T T
ij i j
x x
h
n x x
-
1
h x X X x al)
h serves as a measure of
leverage of the ith data
point, because its
numerator is the
squared distance of the
ith datapoint from its
mean in the X direction,
while its denominator is
a measure of the overall
variability of the data
points along the X-axis.
It therefore the distance
of the data point in
relation to the overall
variation in the X
direction.
h serves as a measure of
leverage of the ith data
point, because its
numerator is the
squared distance of the
ith datapoint from its
mean in the X direction,
while its denominator is
a measure of the overall
variability of the data
points along the X-axis.
It therefore the distance
of the data point in
relation to the overall
variation in the X
direction.
28(c) Megan Reif
29. II.A.1 Hat Matrix, cont.
• Because H is a projection matrix,
(for proof see Belsey et al., 1980, Appendix 2A)
• It is possible to express the fitted values in
terms of the observed values (scalar):
• hij therefore captures the extent to which yi is
close to the fitted values. If it is large, than the
i-th observation had a substantial impact on
the j-th fitted value. The hat value summarizes
the potential influence of yi on ALL the fitted
values.
1 1 2 2 3 3
1
... ...
n
i j j j jj j nj n ij i
i
y h y h y h y h y h y h y
)
29
1 1ih
(c) Megan Reif
30. II.A.1.a Hat and the Residuals
• Since
• The relationship between
the residual and the true
stochastic component also
depends on H. If the hijs are
sufficiently small, e is a
reasonable estimate of ε.
• Note the interesting
situation in which a better
“fit”, if based on extreme
values, may signal an
underestimate of the
randomness in the world.
1
ˆ
then
ˆ( )
ˆsubstituting for ,
( )( )
( )
in scalar form, for 1,2... ,
where I is the identity matrix.
n
i i ij j
j
i n
e h
e y y
e I H y
Xβ y
e I H Xβ ε
e I H ε
30(c) Megan Reif
31. II.A.1.a Hat and the Residuals
• The variance of e is also related to H (See
DeNardo).
• For high leverage cases, in which h approaches its
upper bound of one, the residual value will tend
to zero (see graph above).
• This means that the residuals will not be a
reliable means of detecting influential points, so
we need to transform them… leading us to the
subject of studentized (jacknifed) residuals:
2
( ) (1 )i iiVar e h
31(c) Megan Reif
32. II.A.1.a Hat / Studentized Residuals
PURPOSE: Detection of Multivariate Outliers
• Adjust residuals to make them conspicuous so they are reliable for detecting
leverage and influential points.
• DeNardo’s “internally Studentized residual” is called “standardized residual” or
“normalized”residual in other contexts--can disguise outliers.
• The “externally” Studentized residual uses the Standard Error of the Regression
(Residual Sum of Squares/n-k = e’e/(n-k), deleting the i-th observation, which
allows solving for h, the measure of leverage.
• These residuals are distributed as Student’s t, with n-k d.f, so “a test” of each
outlier can be made, with each studentized residual representing a t-value for its
observation.
• This is an application of the jacknife method, whereby observations are omitted
and estimation iterated to arrive at the studentized residuals (just one of many
applications of jacknife). Also called “Jacknife residual”
*
( )
( )
where s is the Standard Error of the
1
Estimate/Regression calculated after deleting ith observation.
i
i i
i
e
r
s h
32(c) Megan Reif
33. II.A.1.a. continued
Steps for Assessing Studentized Residuals
1. Studentized residuals correspond to the t-statistic we would obtain by including
in the regression a dummy predictor coded 1 for that observation and 0 for all
others. One can then test the null hypothesis that coefficient δ equals zero (Ho:
δ=0) in:
This tests whether case i causes a shift in the regression intercept.
2. We set an alpha significance level α of our overall Type I error risk; probability
of rejecting the null when it is in fact true. According to the Bonferroni
inquality[Pr(set of events occurring) cannot exceed the sum of individual
probabilities of the events], the probability that at least one of the cases is a
statistically significant outlier (when the null hypothesis is actually true) cannot
exceed nα, so….
3. We want to run n tests (one for each case) for each residual at the α/n level
(let’s call this α*). Suppose we set α =.05 and we have 21 observations. To test
whether ANY case in a sample of n=21 is a significant outlier at level α , we
check whether the maximum studentized residual max|ri| is significant at α* =
.05/21=.0024 (given a t-distribution with df = n-K-1; 21-2-1 =19). Most t-tables
do not have low numbers for t, so a computer is required.
33
0 1 1 2 2 1 , 1( ) ...i i i k i k iE y x x x I
(c) Megan Reif
34. Tanzania Revenue Data
regress rexp rev (Expenditure as function of Revenue)
Source | SS df MS Number of obs = 21
-------------+------------------------------ F( 1, 19) = 55.16
Model | 10034268 1 10034268 Prob > F = 0.0000
Residual | 3456450.93 19 181918.47 R-squared = 0.7438
-------------+------------------------------ Adj R-squared = 0.7303
Total | 13490719 20 674535.948 Root MSE = 426.52
------------------------------------------------------------------------------
rexp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rev | .8668668 .1167207 7.43 0.000 .6225675 1.111166
_cons | 798.038 445.0211 1.79 0.089 -133.4019 1729.478
------------------------------------------------------------------------------
predict resid, resid (creates variable with ORDINARY RESIDUALS)
predict estu, rstudent (STUDENTIZED RESIDUALS)
34
EXAMPLE Data: Tanzania.dta (Mukjeree et al)
REV: Gov Recurrent Revenue REXP: Gov Recurrent Expenditure
DEXP: Gov Development Expenditure Year (T) 1970-1990
(c) Megan Reif
35. 4. Identify the largest and smallest residuals. As a rule of thumb, we should pay attention to
residuals with absolute values greater than 2, be worried about those with values greater
than 2.5, and most concerned about those exceeding 3. There are a variety of ways to
identify/inspect these residuals. See
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm for more
options. The fastest in a small dataset is to list the observations with a studentized residual
exceeding + or -2. We see here that 1980 is an outlier. We can use Stata to carry out the
Bonferroni Outlier Test as follows:
list if abs(estu)>2
rev rexp dexp t resid estu decade revcat |
|-----------------------------------------------------------------|
11. | 4506 5627 3096 80 922.8602 2.590934 1 high |
+-----------------------------------------------------------------+
The maximum student residual of 2.59 is our t-value and n=21. For 1980 to be a
significant outlier (cause a significant shift in intercept) at The above P-value
(p=.01796) is not below alpha/n=.0023, so 1980 is not a significant outlier at α
=.05, then t=2.59 must be significant at .05/21.
display .05/21
.00238095
display 2*ttail(19, 2.59)
.01796427
The obtained P-value (P=.01796) is NOT below α/n=.00238, so 1980 is NOT a significant outlier at
α =.05.
35
II.A.1.a. continued-Assessing Studentized Residuals
Bonferroni Outlier Test (Test for outlier influence on y-intercept)
(c) Megan Reif
36. II.A.1.b Hat Matrix and Leverage: Outlier influence on fitted
values (recall that fit is overly dependent on these outliers)
• Note that if hii =1, then ; that is, ei = 0, and the i-th case would be fit exactly.
• This means that, if no observations are exact replicates, one parameter is dedicated
to one data point, which would make it impossible to obtain the determinant to
invert X’X and to obtain OLS estimates.
• This rarely occurs, so that the value of hii will rarely reach its upper bound of 1.
• The MAGNITUDE of hii depends on this relationship
• The higher the value of hii the higher the leverage of the ith data point.
• where c is the number of times the ith row of X is replicated (generally then, h
will range from 1/n to 1, but in survey data, it is possible to have duplicate responses
for multiple respondents, so you can check this in stata with the
“duplicates”command)
• The average hat value is E(h) = (k+1)/n, where k is the number of regressors. We
therefore proceed by looking at the maximum hat value. A hat value has leverage if it
is more than twice the mean hat value.
• Huber (1981) suggests another rule of thumb for interpreting hii , but this might
overlook more than one large hat value.
i iy y
)
1 1
iih
n c
T
ix
36
max( ) .2 little to worry about
.2 max( ) .5 risky
.5 max( ) too much leverage
i
i
i
h
h
h
(c) Megan Reif
37. 70
7172
73
74
75
7677
7879
80
81
8283
84
85
86
87
88
89 90
.05.1.15.2.25
Leverage
2000 3000 4000 5000 6000
rev
predict h, hat
summarize h
Variable | Obs Mean Std. Dev. Min Max
-------------+-------------------------------------------------
h | 21 .0952381 .0638661 .0476762 .2652265
display 2/21
.0952381
list if h>2*.0952381
9. | rev | rexp | dexp | t | resid | estu | |
| 5424 | 5058 | 3115 | 78 | -441.9235 | -1.222458 |
| h .2629347 |
10. | rev | rexp | dexp | t | resid | estu |
| 5433 | 5571 | 3589 | 79 | 63.27473 | .1685843 |
h =.2652265
scatter h rev, mlabel(t)
• Use predict command to create the
hat values for each observation.
• Summarize or calculate to get the
mean.
• List observations whose h values
exceed 2 times E(h). We seen 1978
and 1979 have leverage.
• We can graph the hat values against
the values of the independent
variable(s) The leverage points are
well above 0.2 and more than twice
their mean. Recall that we identified
from EDA that something might be
different for 1978 and 1979. This
means that too much of the sample’s
information about the X-Y
relationship may come from a single
case.
37
II.A.1.b Hat Matrix and Leverage: Outlier influence of X
values on fitted values, continued
(c) Megan Reif
38. ( )
( )
The regression coefficient on is .
Let represent the same coefficient
when the ith case is deleted. Deleting the
ithcase therefore changes the coefficient on
by - . We can express th
k k
k i
k k k i
X b
b
X b b
( )
( )
( )
is change in
standard errors:
-
Where represents the residual standard
deviation with the ith case deleted, and
is the residual sum of squares from the auxiliary
regres
k k i
ik
e i k
e i
k
b b
DFBETAS
s RSS
s
RSS
sion of X on all the other X variables
(without deleting the ith case). The denominator
therefore modifies the usual estimate of the standard
error of the coefficient if the ith case is deleted.
DFBET
k
kb
A can also be expressed in terms of the Hat
statistic (see DeNardo).
• Interpreting direction of influence
with DFBETAS:
• The size of influence: DFBETAS tells
us “By how many standard errors
does the coefficient change if we
drop case i?”
• A DFBETA of +1.34, for example,
means that if case i were deleted,
the coefficient for regressor k would
be 1.34 standard errors lower.
38
II.A.1.c The DFBETA Statistic (depends on X and Y values,
tests how much a case i influences the coefficients, not a
formal test statistic with a hypothesis test))
If 0, case increases magnitude of
If < 0, case decreases magnitude of
ik k
ik k
DFBETAS i b
DFBETAS i b
(c) Megan Reif
39. 0246
Frequency
-.8 -.6 -.4 -.2 0 .2 .4 .6
Dfbeta rev
dfbeta
_dfbeta_1: dfbeta(rev)
list _dfbeta_1| _dfbeta_1 |
1. | .1004588 |
2. | .0401034 |
3. | .0582458 |
4. | -.0044781 |
5. | .1422126 |
6. | -.1596971 |
7. | -.1744603 |
8. | -.1502057 |
9. | -.6607218 |
10. | .0917439 |
11. | .5789033 |
12. | .1527179 |
13. | -.059624 |
14. | -.0800557 |
15. | -.0528694 |
16. | -.0164145 |
17. | .1149248 |
18. | .0945607 |
19. | -.064976 |
20. | -.0342036 |
21. | .0475227 |
display 2/sqrt(21)
.43643578
histogram _dfbeta_1, bin(10) frequency xline(-.4364 .4364) xlabel(#10)
(bin=10, start=-.66072184, width=.12396252)
• Stata’s dfbeta command creates the
DFBETA statistic for each of the
regressors in the model, then list for all
of our observations. A rule of thumb for
large datasets where listing and
inspecting all of the DFBETA values
would be difficult is to inspect all
DFBETAs in excess of 2/sqrt(n)
• Since DFBETAs are obtained by case-
wise deletion, they do not account for
situations where a number of
observations may cluster together,
jointly pulling the regression line in a
direction, but not individually showing
up as influential. You should not rely
solely on DFBETA, then, to test for
influence. A histogram of DFBETA can
reveal groups of influential cases (the
one displayed at left uses reference
lines for + or – 2/sqrt(n) = .4364). Two
observations fall outside the safe range.39
II.A.1.c The DFBETA Statistic
(c) Megan Reif
40. scatter _dfbeta_1 t, ylabel(-1(.5)1) yline(.4364 -.4364) mlabel(t)
list t _dfbeta_1 rev rexp if t==78 | t==80
+------------------------------+
| t _dfbeta_1 rev rexp |
|------------------------------|
9. | 78 -.6607218 5424 5058 |
11. | 80 .5789033 4506 5627 |
• Now that we know there are two potential
observations to worry about, it is useful to
use another plot to identify which they
are (this is most useful for multivariate
regression – it is rather obvious for the
single regressor case).
• We see that 1978 and 1980 are influential.
• Note that 1978 and 1979 had leverage,
but only 1978 is also influential. 1980 is
Influential but did not have leverage
(review Slide 23).
• 1978 decreases the coefficient on revenue
by -.66 standard errors and 1980 increases
it by .58 standard errors.
40
II.A.1.c The DFBETA Statistic
70
71 72
73
74
75 76 77
78
79
80
81
82 83 84
85
86 87
88 89
90
-1-.50.51
Dfbetarev
70 75 80 85 90
t
(c) Megan Reif
41. II.A.1.d Influence of a Case on Model as a Whole (Cook’s
Distance and DFFITS Statistics)
• Returning to the Hat statistic, if
we want to know the effect of
case i on the predicted values, we
can use the DFFITS statistic,
which does not depend on the
coordinate system used to form
the regression model.
• Rule of thumb cutoff values for
small to medium sized data sets
are to inspect observations with
DFFITS that exceed the following
values (and to run the regression
without those observations to
see by how much the coefficient
estimates change):
41
ˆ ( ) [ ( )]
1
to scale the measure, one can divide by
ˆthe standard deviation of the fit, where ( ) is
our estimate of variance with observation i deleted.
1
i i
i i i i
i
i
i i
i
i
he
DFFIT y y i i
h
h
y s i
h
DFFITS
x b b
x b
( ) 1
This is intuitive in that the first term increases the greater
the hat statistic (and therefore the leverage) for case i,
and the second term increases the larger the studentized
re
i
i i
e
h s i h
* *
( )
sidual (outlier).
Since then DFFITS can be written as
11
Then we want to know what the scaled changes in fit for the model
are for the values other than the ith row:
[ ( )]
( )
i i
i i
ii
j
j
e h
r r
hs h
i
s i h
x b b
( ) (1
The absolute value of this change in fit for the remaining cases will
be less than the absolute value for the change attributed to the fitted
ˆvalue when the ith value is del
ij i
j i
i
h e
s i h h
y
eted.
[ ( )]
( )
is the number of standard errors that the fitted value
for case i changes if the ith observation is deleted from the data.
j
i
j
i
DFFITS
s i h
DFFITS
x b b
Small to medium datasets: DFFITS 1
1
Large datasets: DFFITS 2
i
i
k
n
(c) Megan Reif
42. DFFITS and Hat vs. DFFIT PLOT
(from Tanzania Model)
predict dffit, dfits
list t rev rexp dffit
| t rev rexp dffit |
|------------------------------|
1. | 70 3243 3304 -.1932092 |
2. | 71 3497 3569 -.143909 |
3. | 72 3426 3480 -.1642726 |
4. | 73 3756 3809 -.1293692 |
5. | 74 3409 3122 -.3824875 |
|------------------------------|
6. | 75 4169 3891 -.3301978 |
7. | 76 4482 4352 -.2539932 |
8. | 77 4498 4417 -.216292 |
9. | 78 5424 5058 -.7301378 |
10. | 79 5433 5571 .1012859 |
|------------------------------|
11. | 80 4506 5627 .8291757 |
12. | 81 4112 4932 .3522714 |
13. | 82 3603 4594 .3838607 |
14. | 83 3470 4261 .2597122 |
15. | 84 2972 3476 .0768232 |
|------------------------------|
16. | 85 2746 3202 .0211414 |
17. | 86 2623 2929 -.1417075 |
18. | 87 2549 2899 -.1141464 |
19. | 88 2906 3431 .0905055 |
20. | 89 3544 4149 .1518263 |
|------------------------------|
21. | 90 3928 4558 .1956947 |
+------------------------------+
Scatter h dffit, mlabel(t)
• No observation has a DFFIT statistic larger than 1 in this
small dataset. The largest is .829757, for 1980.
• Note that as a function of hat and the studentized
residuals, DFFITS is a kind of measure of
OUTLIERNESS*LEVERAGE
• A graphical alternative to the influence measures is to plot
graphically hat against the studentized residuals to look
for observations for which both are big (only 1979
approaches this criteria, but is well under the DFFITS
cutoff):
42
*
1
i
i i
i
h
DFFITS r
h
70
7172
73
74
75
7677
78 79
80
81
82
83
84
85
86
87
88
8990
.05.1.15.2.25
Leverage
-1 -.5 0 .5 1
Dfits
(c) Megan Reif
43. • Cook’s D is similar to the DFFITS statistic, but DFITS gives relatively
more weight to leverage points, since its shows the effect on an
observation’s fitted value when that particular one is dropped.
• Cook’s Distance “tests the hypothesis” that the true slope coefficients are
equal in the aggregate to the slope coefficients estimated with
observation i deleted (Ho: β =b(i)). It is more a rule of thumb that produces
a measure of distance independent to how the variables are measured,
rather than a formal F-test. influential if Di exceeds the median of the F
distribution with k parameters [(Fk, n-k)(.5)]
• Observations with larger D values than the rest of the data are
those that have unusual leverage.
• While there are numerical rules for assessing Cook’s D authors
differ in their advice.
• Some argue that it is best to graph Cook’s D values to see whether
any one or two points have a much bigger Di than the others.
43
II.A.1.d Influence of a Case on Model as a Whole (Cook’s Distance)
2
*
2
( ) ( )
*2
1
Since
1 1 1
then Cook's can be rewritten as
(1 )
i i i
i i
i i i i i
i i
i
i
h e e
D r
k h s h s h
r h
D
k h
(c) Megan Reif
44. Cook’s D, Continued
predict cooksd, cooksd
• We can then look up the median value of the F-distribution with k+1
numerator and n-k denominator degrees of freedom:
display invFtail(2,19, .5) For the Tanzania data; no observations this large.
.71906057
list t rev rexp if cooksd>.71906057
• Some authors suggest looking at the five most influential, which can be
done in Stata by (NOTE: last term is a lowercase “L” for last observation.).
list t rev rexp cooksd dffit _dfbeta_1 in -5/l
| t rev rexp cooksd dffit _dfbeta_1 |
|-----------------------------------------------------|
17. | 81 4112 4932 .0589684 .3522714 .1527179 |
18. | 82 3603 4594 .0670656 .3838607 -.059624 |
19. | 74 3409 3122 .067792 -.3824875 .1422126 |
20. | 78 5424 5058 .2597905 -.7301378 -.6607218 |
21. | 80 4506 5627 .2642971 .8291757 .5789033 |
+-----------------------------------------------------+
44(c) Megan Reif
45. Proportional Plots for Influence Statistics
• It is useful to graph Cook’s D and DFFITS
with Residual vs. Fitted Plots, with
symbols proportional to the size of
Cook’s D. First we have to predict the
fitted values:
predict yhat
(option xb assumed; fitted values)
• Then weight the symbols by the value of
the influence statistic of interest:
graph twoway scatter resid yhat[aweight =
cooksd], msymbol(Oh) yline(0)
saving(Dprop) NOTE: Prop Plot with weights
disallows labeling, so I create two versions, one with
labels, one with proportions, and use ppt to overlay.
graph twoway scatter resid yhat[aweight =
cooksd], mlabel(t) yline(0)
saving(Dlabe
• We can also plot the studentized residuals vs. HAT
(leverage, not the fitted values), with Proption to
Cook’s D, to look at outlierness, leverage, and
influence at the same time. Same command as
above except variables are: estu h (or whatever you
have named your studentized residuals and hat)
45
-1000-50005001000
Residuals
3000 3500 4000 4500 5000 5500
Fitted values
-2-10123
Studentizedresiduals
.05 .1 .15 .2 .25
Leverage
(c) Megan Reif
46. • Recall that by increasing the variance of one or more Xs, a high-
leverage observation will decrease the standard error of the
coefficient(s), even if it does not influence the magnitude. Though
this may be considered beneficial, it may also exaggerate our
confidence in our estimate, especially if we don’t know if the high-
leverage outlier is representative of the population distribution, or
due entirely to stochastic factors or error (sampling, coding, etc. –
that is, a true outlier).
• Using the COVRATIO statistic, we can examine the impact of
deleting each observation in turn on the size of the joint-confidence
region (in n-space) for β, since the size of this region is equivalent to
the length of the confidence interval for an individual coefficient,
which is proportional to its standard error. The squared length of
the CI is therefore proportional to the sampling variance for b. The
squared size of a joint confidence region is proportional to the
variance for a set of coefficients (“generalized variance”) (Fox 1991,
31; See Belsey et al. for the derivation, pp 22-24).
46
II.A.1.d Influence of a Case on Precision of the Estimates (COVRATIO)
2*2
1
2
(1 )
2
i
i
i
COVRATIO
n k r
h
n k
(c) Megan Reif
47. COVRATIO
• Look for values that differ substantially from 1.
• A small COVRATIO (below 1) means that the
generalized variance of the model would be SMALLER
without the ith observation (i is reducing precision of
estimates)
• A big COVRATIO (above 1) means the generalized
variance would be LARGER without ith case, but if it is
a high-leverage point, it may be making us overly
confident in the precision our estimated coefficients.
• Belsey et al. suggest that a COVRATIO should be
examined when:
47
3( 1)
1i
k
COVRATIO
n
(c) Megan Reif
48. COVRATIO example
48
85
84
88
79
87
73
86
7189
72
70
90
7776
83
7581
82
74
78
80
.05.1.15.2.25
Leverage
.6 .8 1 1.2 1.4 1.6
Covratio
85
8488 79
8773 8671
89
7270
90
77
76
83
75
8182
74
78
80
-1-.50.51
Dfits
.6 .8 1 1.2 1.4 1.6
Covratio
predict covratio, covratio
list t covratio rev rexp if abs(covratio-1)>(3*3)/21
+-----------------------------+
| t covratio rev rexp |
|-----------------------------|
4. | 79 1.511605 5433 5571 |
+-----------------------------+
• We see that 1979 is large and therefore
has perhaps exaggerated our certainty.
• Plotting COVRATIO against hat reveals
that 1979 has leverage, but plotted
against DFFITs, we see it is not greater
than one. 1979 does not affect the
magnitude of our coefficient estimates,
but it may affect our hypothesis testing
and conclusions.
(c) Megan Reif
49. A Summary of Tests / Statistics for Extreme Values (note sample size dependence)
Statistic Formula Use Critical Value Rule of Thumb
Studentized
Residual
Outliers’ Effect on
Intercept
1. Critical values (higher than usual t-test), recommended for
exploratory diagnosis
2. Rule of Thumb Values
Hat Statistic (h) Leverage
Bounded by 1/n to 1
(assumes no replicates-
Check this in survey data).
Higher value=higher leverage:
(depends on X Values)
DFBETA
Influence of a
Case on a
Particular
Coefficient
Calculate for each regressor. Rule of Thumb: Under 2/√n means the
point has no influence; over means the point is influential (depends on
both X AND Y values). Value of DFBETA is # of s.e.s by which case i
increases or decreases coefficient for regressor k.
Cook’s Distance
Influence of a
Case on Model
Measure of aggregate impact of the ith case on the group of regression
coefficients as well as the group of fitted values (sometimes called
forecasting effect). A point is influential if Di exceeds the median of the F
distribution with k parameters [(Fk, n-k)(.5)].
DFFITS
Influence of a
Case on Model
The number of s.e.s by which the
fitted value for ŷi changes
if the ith observation is deleted.
COVRATIO
Influence of a
Case on Model
Standard Errors
Measures how precision of parameter estimates
(generalized variance) change with removal of
ith observation. Inspect if:
Note: n is the sample size; k is the number of regressors; the subscript (i) (i.e., with parentheses) indicates an estimate from the sample omitting observation i. In each case
you should use the absolute value of the calculated statistic.
49
*
( ) 1
i
i
i
e
r
s h
2
2
( )1
( )
i
i
i
T T
ij i j
x x
h
n x x
-
1
h x X X x
( )
( )
-k k i
ik
e i k
b b
DFBETAS
s RSS
*
*
*
2 pay attention
2.5 cause for worry
3 cause for greatest concern
i
i
ii
r
r
r
( 1)
2 where
max( ) .2 little to worry about
.2 max( ) .5 risky
.5 max( ) too much leverage
i
i
i
i
k
h h h
n
or
h
h
h
If 0, case increases magnitude of
If < 0, case decreases magnitude of
ik k
ik k
DFBETAS i b
DFBETAS i b
*
1
i
i i
i
h
DFFITS r
h
Small/med datasets: DFFITS 1
Large datasets: DFFITS 2 ( 1) /
i
i k n
1*2
1
2
(1 )
2
i k
i
i
COVRATIO
n k r
h
n k
*2
(1 )
i i
i
i
r h
D
k h
3( 1)
1i
k
COVRATIO
n
(c) Megan Reif
50. III. Plots to Identify Extreme Values
• EXAMPLE: Model from Mukherjee et al. of crude birth rate as a
function of:
– GNP per capita (logged, per Feb 18 Notes and gnrl practice for such variables)
– IM: Infant mortality
– URBAN: percent % population urban
– HDI: human development index (From WB Human Dev Report 1993)
regress birthr lngnp hdi infmor urbanpop
Source | SS df MS Number of obs = 110
-------------+------------------------------ F( 4, 105) = 129.19
Model | 16552.2585 4 4138.06462 Prob > F = 0.0000
Residual | 3363.19755 105 32.0304528 R-squared = 0.8311
-------------+------------------------------ Adj R-squared = 0.8247
Total | 19915.456 109 182.710606 Root MSE = 5.6595
------------------------------------------------------------------------------
birthr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lngnp | -.2138487 .7960166 -0.27 0.789 -1.792203 1.364505
hdi | -24.50566 7.495152 -3.27 0.001 -39.36716 -9.644157
infmor | .111157 .0396176 2.81 0.006 .0326026 .1897115
urbanpop | .0111358 .0396627 0.28 0.779 -.0675081 .0897797
_cons | 39.56958 6.599771 6.00 0.000 26.48346 52.65571
------------------------------------------------------------------------------ 50
2
0 lnGNPC HDI IM URBBIRTHrt GNPC HDI IM URBAN
(c) Megan Reif
51. III.Plots to Identify Extreme Values:
A. Leverage vs. Normalized Squared Residual Plots
51
lvr2plot, mcolor(green) msize(vsmall) mlabel(cid) mlabcolor(black)
• This plot squares the NORMALIZED residuals (as standard deviation of each
residual from mean residual) to make them more conspicuous in the plot
(these are not the same as the externally studentized residuals).
• Remember that we are worried about observations with HIGH LEVERAGE but
LOW RESIDUALS, which indicates potential influence.
• What we would like to see: A ball of points evenly spread around the
intersection of the two means with no points discernibly far out in any
direction, and no leverage point above 0.2 with a low residual (to the left of
the mean normalized squared residual line).
• The vertical line represents the average squared normalized residual and the
horizontal line represents the average hat (leverage) value.
• Points with high leverage and low residuals will lie below the mean of the
squared residual (X), and above the mean of hat, which we should worry
about if hat is above .2, and really worry if it is above .5.
(c) Megan Reif
52. Leverage vs. Normalized Squared Residual Plots
52
Outlier but low leverage
High Leverage, High Residual
(might be reducing our standard
Errors, but not above risky .2 level.
May want to look at COVRATIO.
Examine
further if
above
0.2 in this
region
Based on this plot, the potential for points with high influence on our
coefficients is low. There are no points that meet the high leverage, low
residual criteria, individually or as a group.
(c) Megan Reif
53. III.Plots to Identify Extreme Values:
B. Partial Regression Leverage Plots (also known as partial-regression leverage plots, adjusted
partial residuals plots, adjusted variable plots, individual coefficient plots, and added variable plots)
53
avplots, rlopts(lcolor(red)) mlabel(cid) mlabsize(tiny) msize(small)
• Plots graph the residuals from fitting y on the Xk variables EXCEPT one (y=Xk-1b)(the value of those
residuals given the Xk-1 variables (e|Xk-1 ) shown on y-axis, …plotted against… ordinary residuals of the
regression of the EXCLUDED xi on the remaining Xk-1 independent variables on the x-axis (xi = Xk-1b)
• Helps to uncover observations exerting a disproportionate influence on the regression model by
showing how each coefficient has been influenced by particular observations.
• The regression slopes we see in the plots are the same as the original multiple regression coefficients for
the regression y=Xkb.
• What we would like to see: Scatter of points even around the line in each plot – the “noise” or size of
the cloud and spacing around the line need not concern us, but points very far from the rest should be
examined.
• Cause for concern: Recall the bivariate examples from the first part of the notes – you are looking for
values extreme in X (horizontal axis) with unusual/out-of-trend y-values. Pay most attention to the
theoretical variable(s) of interest and whether your conclusions and/or statistical significance would
change without the observation.
• Utility of the graph: DFBETA will give a much more precise assessment of the change in magnitude of
the coefficient in the absence of an influence point, but the graph can identify clusters of points that
might be jointly influential.
• Cautions: Pay attention to the SCALE of the axes reported in your computer output—a point may look
like an outlier but in reality be part of a cloud of points on which we are “zoomed in” rather close. If you
have doubts about the reliability of “eyeballing” the plot, you can re-run the regression leaving out the
influential point and comparing the change in slope, but be sure to use commands that will retain the
original scale of the output so you can compare the changes (see slide 54-55). Some books recommend
this plot for deciding to include or discard variables. BETTER TO BASE THIS DECISION ON THEORY,
techniques discussed previously.
(c) Megan Reif
54. NIC
CHL
COL
URY
LKA
TZA
POL
MOZ
VEN
CHN
SYR
BGR
CRI
ECU
DOM
HUN
PHL
PERARG
MEX
TUR
JAMMDG
PRY
IDN
BGD
JOR
TTO
ETH
ROM
PAN
BRA
HND
LAO
SOM
MUS
KEN
MYS
TUN
BOL
KHM
THA
KOR
UGA
GHA
EGY
IND
PAK
SLV
NPL
GRC
NGA
GTM
ZMB
TCD
IRQ
LSO
HTI
ISR
MWI
MAR
ZAF
HKG
ZWEBDI
MLI
BTN
NZL
ESP
CAF
GBR
SLE
TGO
SGP
BEL
AUS
NLD
PRT
IRN
BEN
CIV
COG
IRL
RWA
MRT
DZA
CMR
BFA
SAU
CAN
FRA
DEU
NERSWE
DNK
USA
BWA
ITA
NOR
SEN
PNG
JPNAUT
GIN
FIN
GAB
NAM
CHE
ARE
OMN
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.21384874, se = .79601659, t = -.27
SEN
PNG
CAF
ARE
ZWE
BEN
SGP
TGO
MRT
EGY
CMR
ZMB
GIN
CIV
BWA
TCD
SLV
NER
OMN
NGANAM
GHA
HKG
MAR
KEN
JOR
NIC
TUN
HTI
BELSOM
CHN
DZA
HND
PRY
JAM
GTM
BFA
BDI
DEUDNK
PAN
IRQ
ISR
ROM
SLE
ESP
IND
SWE
NLD
GBR
ITA
PHL
JPNBOL
NZL
DOM
AUS
LAO
FIN
KOR
CAN
NOR
PAK
FRA
LKA
USA
BTN
SAU
ARG
MYS
CHL
CHE
AUT
IDN
RWA
IRL
KHM
BGR
NPL
GRC
VEN
PER
UGA
URY
POL
MUS
PRT
THA
ECU
GAB
HUN
SYR
COLBRA
IRN
MEX
COG
BGD
MOZ
TTO
CRI
ETHLSO
TZA
MLI
ZAF
TUR
MDG
MWI
-1001020
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -24.505659, se = 7.4951522, t = -3.27
SEN
PNG
ZWE
CHN
CAF
TGO
EGY
SGP
LKA
BWA
JAM
KEN
NIC
ARE
PRY
SLV
ZMB
BEN
OMN
PAN
TUN
PHL
ROM
GHA
JOR
MARCMR
HND
HKG
NGA
MYS
GTM
CIV
HTI
POL
BDI
TCDBGR
MRT
CHL
THA
DOM
ESP
IDN
SOM
IND
MUS
BEL
ISR
NER
KOR
DEU
LAO
PRTGRCITA
DZA
DNK
CRI
IRL
GBR
NLD
URY
HUN
NZLNAM
ARG
GIN
SWE
SYR
AUS
JPN
FIN
IRQ
COL
AUT
VEN
FRA
CAN
BOL
ECU
BFA
KHM
NOR
BTN
USA
PAK
UGA
NPL
CHE
PER
TTO
RWA
MEX
BGD
SLE
TZA
BRA
MOZ
SAU
LSOETH
TUR
ZAF
MDG
IRN
COGGAB
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .11115703, se = .03961763, t = 2.81
OMN
PRT
THA
CHE
LSO
AUT
FIN
RWA
BWA
LKAMUS
IRL
MWI
BTN
MYS
BDI
CRI
BGD
GAB
PNG
UGAZAF
NPL
KHM
NAM
ETHMDG
IDN
CHN
ITA
FRA
GRC
NOR
USA
BFA
LAO
JPN
KEN
CAN
COG
TZA
IND
TTO
MLI
ZWE
GTM
SYR
PAN
HUN
NER
PAK
PHL
IRN
JAM
HTI
TUR
PRY
ROM
SWE
DZA
POLGIN
KOR
SLV
DNK
ESP
DEU
HND
AUS
GHA
ECU
TGO
BGR
MARCMR
NZL
MOZ
CIV
MEX
NLD
TUNNGA
SOMGBR
SAU
SLE
EGY
BRA
BOL
ARE
ZMB
COL
DOM
TCD
ISR
BEL
SEN
BEN
IRQ
PER
HKG
ARG
JOR
MRT
CHL
URY
NIC
CAF
VEN
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .01113578, se = .03966274, t = .28
Partial Regression Leverage Plots
54(c) Megan Reif
55. NIC
CHL
COL
URY
LKA
TZA
POL
MOZ
VEN
CHN
SYR
BGR
CRI
ECU
DOM
HUN
PHL
PERARG
MEX
TUR
JAMMDG
PRY
IDN
BGD
JOR
TTO
ETH
ROM
PAN
BRA
HND
LAO
SOM
MUS
KEN
MYS
TUN
BOL
KHM
THA
KOR
UGA
GHA
EGY
IND
PAK
SLV
NPL
GRC
NGA
GTM
ZMB
TCD
IRQ
LSO
HTI
ISR
MWI
MAR
ZAF
HKG
ZWEBDI
MLI
BTN
NZL
ESP
CAF
GBR
SLE
TGO
SGP
BEL
AUS
NLD
PRT
IRN
BEN
CIV
COG
IRL
RWA
MRT
DZA
CMR
BFA
SAU
CAN
FRA
DEU
NERSWE
DNK
USA
BWA
ITA
NOR
SEN
PNG
JPNAUT
GIN
FIN
GAB
NAM
CHE
ARE
OMN
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.21384874, se = .79601659, t = -.27
SEN
PNG
CAF
ARE
ZWE
BEN
SGP
TGO
MRT
EGY
CMR
ZMB
GIN
CIV
BWA
TCD
SLV
NER
OMN
NGANAM
GHA
HKG
MAR
KEN
JOR
NIC
TUN
HTI
BELSOM
CHN
DZA
HND
PRY
JAM
GTM
BFA
BDI
DEUDNK
PAN
IRQ
ISR
ROM
SLE
ESP
IND
SWE
NLD
GBR
ITA
PHL
JPNBOL
NZL
DOM
AUS
LAO
FIN
KOR
CAN
NOR
PAK
FRA
LKA
USA
BTN
SAU
ARG
MYS
CHL
CHE
AUT
IDN
RWA
IRL
KHM
BGR
NPL
GRC
VEN
PER
UGA
URY
POL
MUS
PRT
THA
ECU
GAB
HUN
SYR
COLBRA
IRN
MEX
COG
BGD
MOZ
TTO
CRI
ETHLSO
TZA
MLI
ZAF
TUR
MDG
MWI
-1001020
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -24.505659, se = 7.4951522, t = -3.27
SEN
PNG
ZWE
CHN
CAF
TGO
EGY
SGP
LKA
BWA
JAM
KEN
NIC
ARE
PRY
SLV
ZMB
BEN
OMN
PAN
TUN
PHL
ROM
GHA
JOR
MARCMR
HND
HKG
NGA
MYS
GTM
CIV
HTI
POL
BDI
TCDBGR
MRT
CHL
THA
DOM
ESP
IDN
SOM
IND
MUS
BEL
ISR
NER
KOR
DEU
LAO
PRTGRCITA
DZA
DNK
CRI
IRL
GBR
NLD
URY
HUN
NZLNAM
ARG
GIN
SWE
SYR
AUS
JPN
FIN
IRQ
COL
AUT
VEN
FRA
CAN
BOL
ECU
BFA
KHM
NOR
BTN
USA
PAK
UGA
NPL
CHE
PER
TTO
RWA
MEX
BGD
SLE
TZA
BRA
MOZ
SAU
LSOETH
TUR
ZAF
MDG
IRN
COGGAB
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .11115703, se = .03961763, t = 2.81
OMN
PRT
THA
CHE
LSO
AUT
FIN
RWA
BWA
LKAMUS
IRL
MWI
BTN
MYS
BDI
CRI
BGD
GAB
PNG
UGAZAF
NPL
KHM
NAM
ETHMDG
IDN
CHN
ITA
FRA
GRC
NOR
USA
BFA
LAO
JPN
KEN
CAN
COG
TZA
IND
TTO
MLI
ZWE
GTM
SYR
PAN
HUN
NER
PAK
PHL
IRN
JAM
HTI
TUR
PRY
ROM
SWE
DZA
POLGIN
KOR
SLV
DNK
ESP
DEU
HND
AUS
GHA
ECU
TGO
BGR
MARCMR
NZL
MOZ
CIV
MEX
NLD
TUNNGA
SOMGBR
SAU
SLE
EGY
BRA
BOL
ARE
ZMB
COL
DOM
TCD
ISR
BEL
SEN
BEN
IRQ
PER
HKG
ARG
JOR
MRT
CHL
URY
NIC
CAF
VEN
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .01113578, se = .03966274, t = .28
NIC
CHL
COL
URY
MOZ
TZA
VEN
POL
LKA
SYR
BGR
ECU
CHN
DOM
PER
CRI
ARG
HUN
PHL
MEX
TUR
MDG
JAM
JOR
TTO
PRY
ETH
BGD
IDN
ROM
BRA
SOM
HND
PAN
LAO
BOL
TUN
KEN
MUS
MYS
KOR
KHM
GHA
EGY
UGA
IRQ
PAK
TCD
IND
NGA
ZMB
THA
SLV
NPL
GTM
GRC
ISR
LSO
HTI
MWI
MAR
HKG
ZAF
MLI
NZL
CAF
ZWE
ESP
BTN
BDI
SLE
SGP
GBR
BEL
TGO
AUS
NLD
IRN
BEN
COG
CIV
MRT
PRT
IRL
DZA
SAU
RWA
CMR
BFA
CAN
DEU
NER
DNK
SWE
FRA
USA
NOR
ITA
SEN
JPN
BWA
PNG
AUT
GIN
GAB
FIN
NAM
CHE
ARE
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.91084738, se = .7842531, t = -1.16
SEN
PNG
CAF
ARE
ZWE
BENSGP
TGO
MRT
EGY
CMR
ZMB
GIN
BWA
CIV
TCD
NERSLVNAM
NGA
GHA
MAR
HKG
KEN
JOR
TUN
NIC
HTICHN
BELSOM
DZA
PRY
HND
JAM
GTM
BDI
BFADEUDNK
PAN
ROM
ESP
ISR
IRQ
IND
SLE
SWE
NLD
ITA
GBR
PHL
JPN
NZL
BOL
AUSFIN
DOM
LAO
NOR
CAN
FRA
KORLKA
PAK
USA
BTN
CHE
MYS
AUT
SAU
IRL
RWA
ARG
IDN
CHL
KHM
NPL
BGR
GRC
UGA
PER
VEN
POL
URY
PRT
MUS
THA
GAB
ECU
HUN
SYR
COL
IRN
BRA
MEX
COG
BGD
CRI
TTO
MOZ
ETHLSOTZA
ZAF
MLI
TUR
MDG
MWI
-10-5051015
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -22.281675, se = 7.1637796, t = -3.11
SEN
PNG
ZWE
CHN
TGO
CAF
LKA
EGY
BWA
SGP
JAM
KEN
ARE
NIC
PRY
SLV
ZMB
BEN
PAN
TUN
PHL
ROM
GHA
JOR
CMRMAR
HND
NGA
HKG
MYS
GTM
CIV
BDI
HTI
POL
TCDBGR
THA
MRT
CHL
ESP
MUS
IDN
IND
DOM
SOMBEL
NER
PRT
ISR
KOR
DEU
ITA
LAO
GRC
DZA
IRL
DNK
CRI
NAM
NLDGBR
HUN
NZL
GIN
SWE
URY
FIN
JPN
SYR
ARG
AUS
AUT
IRQ
FRACOL
CAN
BFA
NOR
KHM
BOL
VEN
ECU
BTN
USA
PAK
UGA
CHE
NPL
PER
RWA
TTO
MEX
BGD
SLE
TZA
BRA
LSO
SAU
MOZ
ETH
TUR
ZAF
MDG
IRN
GABCOG
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .12391559, se = .0378934, t = 3.27
PRT
THA
CHE
BWA
FIN
AUT
RWA
LSO
LKAMUS
IRL
BTN
MWI
MYS
BDI
PNG
CRI
GAB
BGD
NAM
UGA
NPL
ZAF
KHM
ITA
FRA
CHN
IDN
ETH
NOR
MDG
USA
GRC
JPN
BFA
KEN
CAN
LAO
IND
COG
ZWETTO
TZA
GTM
NER
PAN
MLI
SYR
PHL
HUN
PAK
JAM
IRN
HTI
PRY
ROM
SWE
DZA
GIN
TUR
SLV
DNK
KOR
ESPPOL
DEU
TGO
AUS
GHA
CMR
HND
MAR
BGR
ECU
NZL
CIV
NLD
TUNNGA
MOZ
MEX
GBRSOM
ARE
SAU
EGY
SLE
ZMB
BOL
BRACOL
TCD
SEN
DOM
ISR
BEL
BEN
IRQ
HKG
PER
MRT
JOR
ARG
CHL
CAFURY
NIC
VEN
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .05291099, se = .03965302, t = 1.33
NIC
CHL
COL
URY
LKA
TZA
POL
MOZ
VEN
CHN
SYR
BGR
CRI
ECU
DOM
HUN
PER
PHL
ARG
MEX
TUR
MDGJAM
BGD
TTO
IDN
ETH
PRY
JOR
ROM
BRA
PAN
HND
LAO
MUS
SOM
MYS
KEN
TUN
BOL
THA
KHM
KOR
UGA
GHA
PAK
EGY
IND
GRC
LSO
NPL
GTM
SLV
NGA
IRQ
TCD
MWI
ZMB
HTI
ISR
ZAF
MAR
HKG
MLI
BTN
BDI
NZL
ZWE
ESP
GBR
SLE
BEL
CAF
AUS
TGO
SGP
PRT
NLD
IRN
COG
IRL
CIV
BEN
RWA
DZA
MRT
BFA
SAU
CAN
CMR
FRA
DEU
SWE
DNK
NER
USA
NOR
ITA
BWA
JPN
PNG
AUT
FIN
GAB
GIN
CHE
NAM ARE
OMN
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.24870177, se = .80570718, t = -.31
PNG
CAF
ARE
BEN
ZWE
SGP
TGO
MRT
EGY
CMR
ZMB
GIN
CIV
BWA
TCD
NER
SLV
NGA
OMN
NAM
GHA
MAR
HKG
KEN
JOR
NIC
TUN
HTI
SOMBEL
CHN
DZA
HND
PRY
JAM
GTM
BFA
BDI
DEUDNK
PAN
IRQ
SLE
ROM
ISR
IND
ESP
SWE
NLD
GBR
ITA
PHL
JPNBOL
NZL
DOM
AUS
LAO
FIN
PAK
KOR
CAN
NOR
BTN
FRA
LKA
SAU
USA
ARG
CHL
MYS
CHE
RWA
IDN
KHM
NPL
AUT
IRL
BGR
UGA
PER
VEN
GRC
URY
POL
MUS
PRT
THA
ECU
GAB
HUN
SYR
COLBRA
IRN
MEX
COG
BGD
MOZ
TTO
CRI
ETH
TZA
LSO
MLI
ZAF
TUR
MDG
MWI
-1001020
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -23.625004, se = 7.9461429, t = -2.97
PNG
ZWE
CHN
CAF
TGO
SGP
EGY
BWA
ARE
LKA
KEN
JAM
NIC
BEN
ZMB
SLV
PRY
OMN
PAN
TUN
GHA
ROM
PHLCMR
JOR
MAR
HND
NGA
HKG
GTM
CIV
MYS
TCD
MRT
HTI
BDI
POL
BGR
CHL
SOM
DOM
ESP
IND
THA
BEL
IDN
NER
MUS
ISR
DEU
DZALAO
KOR
GIN
ITA
NAM
DNK
PRTGRC
GBR
NLD
IRLNZL
URY
SWE
CRI
HUN
ARG
IRQ
JPN
AUS
SYR
FIN
BFA
COL
BOL
FRA
AUT
VEN
CAN
KHM
ECU
NOR
BTN
PAK
USA
UGA
NPL
CHE
PER
RWA
TTO
SLE
MEX
BGD
TZA
BRA
SAU
MOZ
ETHLSO
TUR
ZAF
IRN
MDG
COGGAB
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .11481757, se = .04116964, t = 2.79
OMN
PRT
THA
CHE
LSO
AUT
FIN
RWA
BWA
LKAMUS
MWI
IRL
BTN
MYS
CRI
BDI
BGD
GAB
ZAF
UGA
NPL
MDGETH
KHM
PNGNAM
IDN
FRA
ITA
CHN
GRC
NOR
USA
JPN
LAO
BFA
COG
CAN
TZA
TTO
KEN
IND
MLI
SYR
HUN
GTM
PAN
IRN
PAK
PHL
ZWE
NERTUR
JAM
HTI
PRY
ROM
SWE
DZA
POL
KOR
ESP
GINDNK
SLV
DEU
ECU
AUS
HND
BGR
GHA
MOZ
MAR
TGO
NZL
CMR
MEX
NLD
CIV
TUN
GBRSOM
NGA
SAU
SLE
BRA
BOL
COL
EGY
ZMB
ARE
DOM
TCD
ISR
BEL
IRQ
PERBEN
ARG
HKG
JOR
CHL
MRT
URY
NIC
VEN
CAF
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .00935766, se = .04016079, t = .23
55
NIC
CHL
COL
URY
LKA
TZA
POL
MOZ
VEN
CHN
SYR
BGR
CRI
ECU
DOM
HUN
PER
PHL
ARG
MEX
TUR
MDGJAM
BGD
TTO
IDN
ETH
PRY
JOR
ROM
BRA
PAN
HND
LAO
MUS
SOM
MYS
KEN
TUN
BOL
THA
KHM
KOR
UGA
GHA
PAK
EGY
IND
GRC
LSO
NPL
GTM
SLV
NGA
IRQ
TCD
MWI
ZMB
HTI
ISR
ZAF
MAR
HKG
MLI
BTN
BDI
NZL
ZWE
ESP
GBR
SLE
BEL
CAF
AUS
TGO
SGP
PRT
NLD
IRN
COG
IRL
CIV
BEN
RWA
DZA
MRT
BFA
SAU
CAN
CMR
FRA
DEU
SWE
DNK
NER
USA
NOR
ITA
BWA
JPN
PNG
AUT
FIN
GAB
GIN
CHE
NAM ARE
OMN
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.24870177, se = .80570718, t = -.31
PNG
CAF
ARE
BEN
ZWE
SGP
TGO
MRT
EGY
CMR
ZMB
GIN
CIV
BWA
TCD
NER
SLV
NGA
OMN
NAM
GHA
MAR
HKG
KEN
JOR
NIC
TUN
HTI
SOMBEL
CHN
DZA
HND
PRY
JAM
GTM
BFA
BDI
DEUDNK
PAN
IRQ
SLE
ROM
ISR
IND
ESP
SWE
NLD
GBR
ITA
PHL
JPNBOL
NZL
DOM
AUS
LAO
FIN
PAK
KOR
CAN
NOR
BTN
FRA
LKA
SAU
USA
ARG
CHL
MYS
CHE
RWA
IDN
KHM
NPL
AUT
IRL
BGR
UGA
PER
VEN
GRC
URY
POL
MUS
PRT
THA
ECU
GAB
HUN
SYR
COLBRA
IRN
MEX
COG
BGD
MOZ
TTO
CRI
ETH
TZA
LSO
MLI
ZAF
TUR
MDG
MWI
-1001020
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -23.625004, se = 7.9461429, t = -2.97
PNG
ZWE
CHN
CAF
TGO
SGP
EGY
BWA
ARE
LKA
KEN
JAM
NIC
BEN
ZMB
SLV
PRY
OMN
PAN
TUN
GHA
ROM
PHLCMR
JOR
MAR
HND
NGA
HKG
GTM
CIV
MYS
TCD
MRT
HTI
BDI
POL
BGR
CHL
SOM
DOM
ESP
IND
THA
BEL
IDN
NER
MUS
ISR
DEU
DZALAO
KOR
GIN
ITA
NAM
DNK
PRTGRC
GBR
NLD
IRLNZL
URY
SWE
CRI
HUN
ARG
IRQ
JPN
AUS
SYR
FIN
BFA
COL
BOL
FRA
AUT
VEN
CAN
KHM
ECU
NOR
BTN
PAK
USA
UGA
NPL
CHE
PER
RWA
TTO
SLE
MEX
BGD
TZA
BRA
SAU
MOZ
ETHLSO
TUR
ZAF
IRN
MDG
COGGAB
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .11481757, se = .04116964, t = 2.79
OMN
PRT
THA
CHE
LSO
AUT
FIN
RWA
BWA
LKAMUS
MWI
IRL
BTN
MYS
CRI
BDI
BGD
GAB
ZAF
UGA
NPL
MDGETH
KHM
PNGNAM
IDN
FRA
ITA
CHN
GRC
NOR
USA
JPN
LAO
BFA
COG
CAN
TZA
TTO
KEN
IND
MLI
SYR
HUN
GTM
PAN
IRN
PAK
PHL
ZWE
NERTUR
JAM
HTI
PRY
ROM
SWE
DZA
POL
KOR
ESP
GINDNK
SLV
DEU
ECU
AUS
HND
BGR
GHA
MOZ
MAR
TGO
NZL
CMR
MEX
NLD
CIV
TUN
GBRSOM
NGA
SAU
SLE
BRA
BOL
COL
EGY
ZMB
ARE
DOM
TCD
ISR
BEL
IRQ
PERBEN
ARG
HKG
JOR
CHL
MRT
URY
NIC
VEN
CAF
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .00935766, se = .04016079, t = .23
Note that Senegal looked like a
possible outlier but it was of the
good sort and it wasn’t particularly
extreme relative to the scale of
values shown. The coefficient
changes little and the SE increases
slightly without it (indicating it was
contributing to the fit somewhat).
(c) Megan Reif
56. 56
NIC
CHL
COL
URY
MOZ
TZA
VEN
POL
LKA
SYR
BGR
ECU
CHN
DOM
PER
CRI
ARG
HUN
PHL
MEX
TUR
MDG
JAM
JOR
TTO
PRY
ETH
BGD
IDN
ROM
BRA
SOM
HND
PAN
LAO
BOL
TUN
KEN
MUS
MYS
KOR
KHM
GHA
EGY
UGA
IRQ
PAK
TCD
IND
NGA
ZMB
THA
SLV
NPL
GTM
GRC
ISR
LSO
HTI
MWI
MAR
HKG
ZAF
MLI
NZL
CAF
ZWE
ESP
BTN
BDI
SLE
SGP
GBR
BEL
TGO
AUS
NLD
IRN
BEN
COG
CIV
MRT
PRT
IRL
DZA
SAU
RWA
CMR
BFA
CAN
DEU
NER
DNK
SWE
FRA
USA
NOR
ITA
SEN
JPN
BWA
PNG
AUT
GIN
GAB
FIN
NAM
CHE
ARE
-1001020
e(birthr|X)
-2 -1 0 1 2
e( lngnp | X )
coef = -.91084738, se = .7842531, t = -1.16
SEN
PNG
CAF
ARE
ZWE
BENSGP
TGO
MRT
EGY
CMR
ZMB
GIN
BWA
CIV
TCD
NERSLVNAM
NGA
GHA
MAR
HKG
KEN
JOR
TUN
NIC
HTICHN
BELSOM
DZA
PRY
HND
JAM
GTM
BDI
BFADEUDNK
PAN
ROM
ESP
ISR
IRQ
IND
SLE
SWE
NLD
ITA
GBR
PHL
JPN
NZL
BOL
AUSFIN
DOM
LAO
NOR
CAN
FRA
KORLKA
PAK
USA
BTN
CHE
MYS
AUT
SAU
IRL
RWA
ARG
IDN
CHL
KHM
NPL
BGR
GRC
UGA
PER
VEN
POL
URY
PRT
MUS
THA
GAB
ECU
HUN
SYR
COL
IRN
BRA
MEX
COG
BGD
CRI
TTO
MOZ
ETHLSOTZA
ZAF
MLI
TUR
MDG
MWI
-10-5051015
e(birthr|X)
-.2 -.1 0 .1 .2
e( hdi | X )
coef = -22.281675, se = 7.1637796, t = -3.11
SEN
PNG
ZWE
CHN
TGO
CAF
LKA
EGY
BWA
SGP
JAM
KEN
ARE
NIC
PRY
SLV
ZMB
BEN
PAN
TUN
PHL
ROM
GHA
JOR
CMRMAR
HND
NGA
HKG
MYS
GTM
CIV
BDI
HTI
POL
TCDBGR
THA
MRT
CHL
ESP
MUS
IDN
IND
DOM
SOMBEL
NER
PRT
ISR
KOR
DEU
ITA
LAO
GRC
DZA
IRL
DNK
CRI
NAM
NLDGBR
HUN
NZL
GIN
SWE
URY
FIN
JPN
SYR
ARG
AUS
AUT
IRQ
FRACOL
CAN
BFA
NOR
KHM
BOL
VEN
ECU
BTN
USA
PAK
UGA
CHE
NPL
PER
RWA
TTO
MEX
BGD
SLE
TZA
BRA
LSO
SAU
MOZ
ETH
TUR
ZAF
MDG
IRN
GABCOG
MWI
MLI
-1001020
e(birthr|X)
-40 -20 0 20 40
e( infmor | X )
coef = .12391559, se = .0378934, t = 3.27
PRT
THA
CHE
BWA
FIN
AUT
RWA
LSO
LKAMUS
IRL
BTN
MWI
MYS
BDI
PNG
CRI
GAB
BGD
NAM
UGA
NPL
ZAF
KHM
ITA
FRA
CHN
IDN
ETH
NOR
MDG
USA
GRC
JPN
BFA
KEN
CAN
LAO
IND
COG
ZWETTO
TZA
GTM
NER
PAN
MLI
SYR
PHL
HUN
PAK
JAM
IRN
HTI
PRY
ROM
SWE
DZA
GIN
TUR
SLV
DNK
KOR
ESPPOL
DEU
TGO
AUS
GHA
CMR
HND
MAR
BGR
ECU
NZL
CIV
NLD
TUNNGA
MOZ
MEX
GBRSOM
ARE
SAU
EGY
SLE
ZMB
BOL
BRACOL
TCD
SEN
DOM
ISR
BEL
BEN
IRQ
HKG
PER
MRT
JOR
ARG
CHL
CAFURY
NIC
VEN
SGP
-1001020
e(birthr|X)
-40 -20 0 20 40
e( urbanpop | X )
coef = .05291099, se = .03965302, t = 1.33
(c) Megan Reif
57. III.Plots to Identify Extreme Values:
C. Star Plots for outliers, leverage, and model generalized influence
display invFtail(5,105, .5)
.87591656
(use above command to get cut-off for Cook’s
D), then use with the other rules of
thumb to choose observations to display:
graph7 estu h cooksd if abs(estu) > 2 & h >
.2 & cooksd > .87591656, star
graph7 estu h cooksd, star label(cid)
select(88, 108)
NOTES: This is an old but working Stata 7 command, search
“graph7” for help file. Variable and thus direction associated
w/ each line depends on order listed in command .
57
• The scaling of a star chart is a function of all the stars. Selecting just a few to be displayed still
maintains the scaling based on all the observations and variables.
• In our example model, no observations meet all three criteria for influence, so instead I will tell
Stata to select some observations that include Senegal and Oman to show what the plot looks
like (do this by using selecting observations 88-108)
• What we want to see: Dot OR (Line in outlier direction &/OR Line in leverage direction) and no
or tiny line in influence direction.
• Look for longer lines in influence direction (pointing Lower LFT), leverage (lower RT).
(c) Megan Reif
58. III.C. Star Plots for DFBETAS (Individual Coefficient Influence)
display 5/sqrt(110)
.47673129
(use above command to get cut-off for
dfbeta)
graph7 _dfbeta_1 _dfbeta_2 _dfbeta_3
_dfbeta_4 if abs(_dfbeta_1) >
.4767 | abs(_dfbeta_2) >.4767 |
abs(_dfbeta_3) >.4767 |
abs(_dfbeta_4) >.4767, star
label(cid)
NOTE we have to create new variables in
the next command to ensure graphing of
absolute values, so we do not know from
Star Plot whether the point increases or
decreases the coefficient.
gen dflngnp=abs(_dfbeta_1)
gen dfhdi=abs(_dfbeta_2)
gen dfinform=abs(_dfbeta_3)
gen dfurban=abs(_dfbeta_4)
graph7 dfbeta_1 _dfbeta_2 _dfbeta_3
_dfbeta_4, star label(cid)
select(88, 108)
58
• The scaling of a star chart is a function of all the stars. Selecting just a few to be displayed still
maintains the scaling based on all the observations and variables.
• In our example model, only OMAN meets ANY of the criteria for influence, so let’s select some
observations to show what the plot looks like (a good reminder to use the statistics and rules of
thumb in addition to eyeballing). Only OMAN is influential on all the coefficients at a level
above the cut-off point for DFBETAS. OMAN is an oddity—lots of oil, relatively small Omani
population, high birth rates, and a great deal of social development spending, raising HDI
despite a largely rural population. How would you model this without deleting Oman?
• What we want to see: Dot, tiny lines in ALL directions.
• Look for longer lines ANY direction.
(c) Megan Reif
59. A Summary of COMMON DIAGNOSTIC PLOTS to identify potential extreme values
Plot Type/
Command
Preferred
Appearance
Use Description/Interpretation
Leverage(h) (y-axis)
v. Squared Normalized
Residual Plot
(x-axis)
lvr2plot
Scatter evenly spread
around intersection of
two means; no points
to left of the mean
normalized squared
residual line (upper
LEFT quadrant)
Potential Influence
on (1) ALL
coefficients and (2)
standard errors
Vertical line represents average
squared normalized residual and
horizontal line average hat value.
1. IDENTIFY POINTS in RED AREA
High Leverage AND Low Residual, when
leverage greater than 0.2
2. POINTS in upper RIGHT quadrant
are high leverage (>0.2) & high residual;
Not influential on b but may diminish
SEs and overstate certainty).
Partial Regression
Leverage Plots (Also
called Added Variable
Plot)
avplots
Scatter (loose or tight)
of points even around
the line in each plot.
Potential Influence
on EACH
coefficient
Residuals from regressing y on the Xk
EXCEPT one (y=Xk-1b) y-axis, v.
ordinary residuals EXCLUDED xi on
remaining Xk-1 variables (xi = Xk-1b) x-axis
Look for points extreme in X
w/ unusual e|y=Xk-1b values.
CAUTIONS: (a) Verify points identified
through “eyeballing” with DFBETAS.
(b) Pay attention to scale of plots.
Stretched or compacted displays mislead.
Star Plots
(a) Outliers, Leverage,
& Model Influence
(Cook’s D)
gr7 estu h cooksd, star
(b) Coefficient Influence
(DFBETAs)
gr7 dfx1 dfx2 dfxn, star
(a) Dot OR (Line in
outlier Direction
&/OR Line in
Leverage Direction)
and no or tiny line
in influence
direction.
(b) Dot (Lines in one or
more directions=
possible influence
on one or more
coefficients)
(a) Multivariate
Outliers, Leverage,
&/or Influence
Points
(b) Potential
Influence on EACH
coefficient
(a) Look for longer lines in Direction (a)
(pointing Lower LFT), leverage
(lower RT).
(b) Look for longer line in any direction
for DFBETA – each for a coefficient.
NOTES: 1. Working old Stata 7 command,
search “graph7” for help file. 2. Variable (b)
associated w/ each line depends on
order listed in command .
(c) Megan Reif 59
60. Cautions about Extreme Value Procedures
• One weakness in the DFFITS and other statistics is that they will not always detect cases
where there are two similar outliers. A single point would count as influential by itself, but
included together, they are influential.
• Cluster of outliers may indicate that model was wrongly applied to set of points. Partial
regression plots and other methods may be better for finding such clusters than individual
diagnostic statistics such as DFBETA. Both types of postestimation should be conducted.
• A single outlier may indicate a typing error, ignoring a special missing data code, such as 999,
or suggest that the model does account for important variation in the data. Only delete or
change an observation if it is an obvious error, like a person being 10 feet tall, or a negative
geographical distance.
• Should not be abused to remove points to effect a desired change in a coefficient or its
standard error! “An observation should only be removed if it is shown to be uncorrectably in
error. Often no action is warranted, and when it is, the action should be more subtle than
deletion….the benefits obtained from information on influential points far outweigh any
potential danger” (Belsey et al., 16).
• Think about non-linear or other specifications that might model the outliers directly. Outliers
may present a research opportunity—do the outliers have anything in common?
• Often the most that can be done is to report the results both with and without the outlier
(maybe with one of the results in an appendix). The exception to this is the case of extreme
x-values. It is possible to reduce the range over which your predictions will be valid (e.g., only
OECD countries, only EU, only low-income, etc.)--it is ok to say your height and weight
relationship is only usable for those between 5’5” and 6’5” for example, or that your model
only applies to advanced industrialized democracies.
60(c) Megan Reif
61. RESOURCES
• UCLA Stata Regression Diagnostic Steps (good examples of data with problems)
– http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm
– http://www.ats.ucla.edu/stat/stata/examples/alsm/alsm9.htm
– http://www.ats.ucla.edu/stat/stata/examples/ara/arastata11.htm
• Belsley, D. A., E. Kuh, and R. E. Welsch. (1980). Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. New York: Wiley.
• Cook, D. R. and S. Weisberg (1982). Residuals and Influence in Regression. New
York, NY, Chapman and Hall.
• Fox, J. (1991). Regression Diagnostics. Newbury Park: Sage Publications.
• Hamilton, L. C. (1992). Regression with Graphics: A Second Course in Applied
Statistics. Pacific Grove, CA: Brooks/Cole Publishing Company.
– Also has excellent chapter on pre-estimation graphical inspection of data
– Includes section on post-estimation diagnostics for logit
• For Regression Diagnostics for survey data (weighting for surveys requires adjusted
methods), see Li, J. and R. Valliant Influence Analysis in Linear Regression with
Sampling Weights. 3330-3337 and Valliant, R., J. Li, et al. (2009). Regression
Diagnostics for Survey Data. Stata Conference. Washington, DC, Stata User's Gro
• Temple, J. (2000). "Growth Regressions and What the Textbooks Don't Tell You."
Bulletin of Economic Research 52(3): 181-205. The paper discusses three
econometric problems that are rarely given adequate discussion in textbooks:
model uncertainty, parameter heterogeneity, and outliers.
61(c) Megan Reif
62. PS 699 Section March 25, 2010
Megan Reif
Graduate Student Instructor, Political Science
Professor Rob Franzese
University of Michigan
Regression Diagnostics
1. Diagnostics for Assessing (assessable) CLRM
Assumptions
2. Diagnostics for Assessing Data Problems (e.g.,
Multicolinearity)
63. Step One: Histogram and Box-Plot of the Ordinary Residuals (the former is useful in detecting multi-modal distribution of residuals,
which suggests omitted qualitative variable that divides data into groups)
Step Two: Graphical methods – tests exist for error normality, but visual methods are generally preferred
Step Three: Q-Q Plot of Residuals vs. Normal Distribution, and Normal Probability Plot
Background: What is a Q-Q Plot – Quantile-Quantile Plot?
– Q-Q Plot is a scatterplot that graphs quantiles of one variable against quantiles of the second variable
– The Quantiles are the data values in ascending order, where the first coordinate shows the lowest x1 value
against the lowest x2 value, the second coordinate are the next two lowest values of x1 and x2 and so on (We
graph a set of points with coordinates (X1i, X2i), where X1i is the ith-from lowest value of X1 and X2i is the ith-
from-lowest value of X2).
– What we can learn from a Q-Q Plot of Two Variables:
1. If the distributions of the two variables are similar in center, spread, and shape, then the points
will lie on the 45-degree diagonal line from the origin.
2. If the distributions have the same SPREAD and SHAPE but different center (mean, median…), the
points will follow a straight line parallel to the 45-degree diagonal but not crossing the origin.
3. If distributions have different spreads/variances and centers, but similar in shape, the points will
follow a straight line NOT parallel to the diagonal.
4. If the points do not follow a straight line, the distributions are different shapes entirely.
Two uses for Q-Q Plots:
1. Compare two empirical distributions (useful to assess whether subsets of the data, such as
different time periods or groups, share the same distribution or come from different
populations).
2. Compare an empirical distribution against a theoretical distribution (such as the Normal).
I. Normal Distribution of Disturbances, ε,
Can only be Evaluated using Estimate e.
64. A. Residual Quantile-Normal Plot (also known as probit plot,
normal-quantile comparison plot of residuals)
1. Quantile–Normal Plot (qnorm): - emphasize the tails of the
distribution
2. Normal Probability Plot (pnorm): put the focus on the center of the
distribution
• What we expect to see if the empirical distribution is
identical to a normal distribution, expect all points to
lie on a diagonal line.
I.A. Q-Q Plot of Residuals vs. Normal Distribution
66. Quantile-Quantile Plot Diagnostic Patterns
Description of Point Pattern Possible Interpretation
Points on 45
o
diagonal line from Origin Distributions similar in center, spread, and shape
Points on straight line parallel to 45
o
diagonal Same SPREAD and SHAPE but different center
(mean, median…), never see e with non-zero mean!
Points follow straight line NOT parallel to the diagonal Different spreads/variances and centers, but similar
in shape.
Points do not follow a straight line Distributions have different shape.
Vertically Steep (closer to parallel to y-axis)
at Top and Bottom
Heavy Tails, Outliers at Low and High
Data Values
Horizontal (closer to parallel to x-axis) at Top and
Bottom
Light Tails, Fewer Outliers
Two or more less-step areas (horizontal parallel to x-
axis) indicate higher than normal density, separated by
a gap or steep climb (area of lower density)
Distribution is bi- or multi-modal
(subgroups, different populations)
All but a few points fall on a line - some points are
vertically separated from the rest of the data
Outliers in the data
Left end of pattern is below the line; right end of
pattern is above the line
Long tails at both ends of the data
distribution
Left end of pattern is above the line; right end of
pattern is below the line
Short tails at both ends of the distribution
Curved pattern with slope increasing from left to right Data distribution is skewed to the right
Curved pattern with slope decreasing from left to right Data distribution is skewed to the left
Granularity: Staircase pattern (plateaus and gaps) Data values have been rounded
or are discrete
67. • CONTINUING EXAMPLE (from March 18 Notes): Model from
Mukherjee et al. of crude birth rate as a function of:
– GNP per capita (logged, per Feb 18 Notes and gnrl practice for such variables)
– IM: Infant mortality
– URBAN: percent % population urban
– HDI: human development index (From WB Human Dev Report 1993)
regress birthr lngnp hdi infmor urbanpop
Source | SS df MS Number of obs = 110
-------------+------------------------------ F( 4, 105) = 129.19
Model | 16552.2585 4 4138.06462 Prob > F = 0.0000
Residual | 3363.19755 105 32.0304528 R-squared = 0.8311
-------------+------------------------------ Adj R-squared = 0.8247
Total | 19915.456 109 182.710606 Root MSE = 5.6595
------------------------------------------------------------------------------
birthr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lngnp | -.2138487 .7960166 -0.27 0.789 -1.792203 1.364505
hdi | -24.50566 7.495152 -3.27 0.001 -39.36716 -9.644157
infmor | .111157 .0396176 2.81 0.006 .0326026 .1897115
urbanpop | .0111358 .0396627 0.28 0.779 -.0675081 .0897797
_cons | 39.56958 6.599771 6.00 0.000 26.48346 52.65571
------------------------------------------------------------------------------
6
2
0 lnGNPC HDI IM URBBIRTHrt GNPC HDI IM URBAN
68. qnorm estu, grid mcolor(green) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) yline(-.1535893, lpattern(longdash) lcolor(cranberry))
caption(Red Dashed Line Shows Median of Studentized Residuals, size(vsmall)) legend(on)
--------------------------------------------
Quantile-
Normal
Plot
69. pnorm estu, grid mcolor(green) msize(small) msymbol(circle) mlabel(cid)
mlabsize(tiny) mlabcolor(gs4) mlabangle(forty_five) legend(on)
What does this granularity suggest?
Normal
Probability
Plot
70. What Non-Normal Residuals do to your OLS
Estimates and What do to
• If errors not normally distributed:
– Efficiency decreases and inference based on t- and F distributions are
not justified, especially as sample size decreases
– Heavy-tailed error distributions (more outliers) will result in great
sample-to-sample variation (less generalizability)
– Normality is not required in order to obtain unbiased estimates of the
regression coefficients.
• If you have not already transformed skewed variables, doing so may
help, as non-normal distribution of e may be caused by skewed X
and/or Y distributions.
• Model re-specification may be required if evidence of granularity,
multi-modality
• Robust methods provide alternatives to OLS for dealing with non-
normal errors.
71. (Ordinary) Residual vs. Fitted Plot
CLRM:
•Heteroskedasticity (leads to inefficiency and biased
standard error estimates)
•Residual Non-Normality (compounds in efficiency
and undermines rationale for t- and F-tests, casting
doubt on p-values reported in output)
SPECIFICATION:
•Non-linearity in X-Y relationship(s)
72. rvfplot, mcolor(green) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on)
Heteroskedasticity
Variance for smaller
fitted values larger than
for medium fitted
values?
73. Absolute Value of Residual v. Fitted (easier to see heteroskedasticity)
predict yhat
predict resid, resid
gen absresid=abs(resid)
graph twoway scatter absresid yhat, mcolor(green) msize(small) msymbol(circle)
mlabel(cid) mlabsize(tiny) mlabcolor(gs4) mlabangle(forty_five) legend(on)
74. Note: Fox Recommends Using Studentized Residuals vs. Fitted Values
(in example there is little difference)
graph twoway scatter estu yhat, mcolor(green) msize(small) msymbol(circle) mlabel(cid)
mlabsize(tiny) mlabcolor(gs4) mlabangle(forty_five) legend(on)
75. Residual v. Predictor Plot
• Heteroskedasticity e varies with values of one
or more Xs.
rvpplot lngnp, mcolor(green) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(lngnp)
rvpplot hdi, mcolor(red) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(hdi, replace)
rvpplot infmor, mcolor(blue) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(infmor, replace)
rvpplot urbanpop, mcolor(orange) msize(small) msymbol(circle) mlabel(cid)
mlabsize(tiny) mlabcolor(gs4) mlabangle(forty_five) legend(on) name(urbanpop,
replace)
graph combine lngnp hdi infmor urbanpop
77. Component-Plus-Residual Plot
• The component plus residual plot is also known as partial-
regression leverage plots, adjusted partial residuals plots or
adjusted variable plots.
• This plot shows the expectation of the dependent variable
given a single independent variable, holding all else constant,
PLUS the residual for that observation from the FULL model.
• Looks at one of the explained parts of Y, plus the unexplained
part (e), plotted against an independent variable.
• CLRM: Heteroskedasticity
• Functional form / non-linearity
cprplot lngnp, mcolor(green) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(lngnpcp, replace)
cprplot hdi, mcolor(green) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(hdicp, replace)
cprplot hdi, mcolor(red) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(hdicp, replace)
cprplot infmor, mcolor(blue) msize(small) msymbol(circle) mlabel(cid) mlabsize(tiny)
mlabcolor(gs4) mlabangle(forty_five) legend(on) name(infcp, replace)
cprplot urbanpop, mcolor(orange) msize(small) msymbol(circle) mlabel(cid)
mlabsize(tiny) mlabcolor(gs4) mlabangle(forty_five) legend(on) name(urbcp, replace)
graph combine lngnpcp hdicp infcp urbcp
79. Durbin Watson Test Statistic
Correlograms
Semi-Variograms
Time Plot
I. Autocorrelation
80. Variance Inflation Factor
High Collinearity increases se and reduce significance on
important variables.
VIF = 1/(1-R2j) where R2j is from the regression of variable j
on the other independent variables. If variable j is
completely uncorrelated with the other variables, than the
R2 j will be zero, and VIF will be one. If fit perfect, R2j will
be large. Larger VIF=more coll.
I. Multicollinearity
81. Summary of COMMON DIAGNOSTIC PLOTS to assess CLRM assumptions & Data Problems (post-estimation)
Plot Type/Command Preferred Appearance Use Description/Interpretation
Quantile-Normal Plot
(Ordinary Residuals vs.
Normal)
qnorm estu
&
Normal Probability Plot
(Studentized Residuals v.
Standard Normal)
pnorm estu
If the empirical
distribution of the
residuals is identical to
a normal distribution,
expect all points to lie
on the 45-degree
diagonal line through
the origin.
Normally Distributed
Stochastic
Component
Q-Normal Plot: Inspect Tails
P-Probability Plot: Inspect Middle
1. Look for multi-modality, granularity
(possible misspecification)
2. Right or Left Skewness (bowled up,
Bowled down)
3. Heavy Tails
4. Vertical difference in values (outliers)
Ordinary or Studentized
Residual v. Fitted Values
rvfplot
&
|Residual| v. Fitted
graph twoway scatter
absresid yhat
No discernable pattern,
even band with
constant variance
above and below zero,
and high and low
values of y.
CLRM:
Heteroskedasticity e
varies with y
Residual Normality
SPECIFICATION:
Non-linearity in X-Y
relationship(s)
Sum total of what the regression has explained.
1. Look for systematic variation in the distance of
residuals from their mean of zero.
2. Q-N plot better to assess normality
3. This plot helps asses whether error
variance Increases or decreases at
smaller or larger values of y.
4. Clusters of residuals above or below zero
(Ordinary) Residual v.
Predictor Plot (each X)
rvpplot x1varname
No discernable pattern,
even band with
constant variance
above and below zero,
and high and low
values of each X.
CLRM:
Heteroskedasticity e
varies with values of
one or more Xs.
SPECIFICATION:
Non-linearity
1. Look for systematic variation in the
distance of residuals from mean
2. Whether error variance increases or
decreases at smaller or large values of
each X
3. Clusters of residuals above or below 0.
Component Plus
Residual Plot
cprplot x1varname