Anzeige
Anzeige

Más contenido relacionado

Anzeige

1_Introduction_printable.pdf

  1. Week 1 Intro & Stats I Review Applied Statistical Analysis II Jeffrey Ziegler, PhD Assistant Professor in Political Science & Data Science Trinity College Dublin Spring 2023
  2. Road map for today Welcome and introduction I About module, structure I Review of last term I Bridging terms Alternative derivations of least squares Frequentist model of likelihood Intro to GLMs & MLE Next time: Framework of generalised linear models (GLMs) By next week, please... I Fork GitHub repository I Read assigned chapters 1 53
  3. General Info About Course Instructor Jeffrey Ziegler, PhD Email zieglerj@tcd.ie In-Person Sessions 16:00 - 18:00 Mondays AP2.05 Office Hours W/Th 13:00-14:00 2 53
  4. Review of Tools: Necessity of R R is the statistical programming language Perform stats analysis, data manipulation, plotting, etc. Notes on R code: I ALWAYS comment your code Worry more about saying too little rather than too much I Use indentation to visually clarify blocks of code, such as multiple lines for one command or multiple commands that produce one logical step I Plan for your code to be run from source files Interactive analysis is great for exploration, terrible for (re)analysis, which your job and/or reputation may depend I Create command source files with all your analysis (.R file) I Final plots go to files, not screen (dev.off()) 3 53
  5. Review of Tools: GitHub, & LaTex LaTex is the word processor Input output, figures, and code from R Using Word will result in a deduction GitHub is how we’ll share our work with each other Fork repository Keep up-to-date Keep organised 4 53
  6. As a reminder, you can access the syllabus here... Direct link to syllabus, or find on course website 5 53
  7. Required Materials Texts All readings are provided, don’t buy books! R and LaTeX Should have installed Rstudio and TexStudio Should have installed 6 53
  8. Course Evaluation/Assessment Problem Sets (3/4): 50% Exam: 25% Replication: 25% 7 53
  9. Problem Sets (50%) Typically assigned every other week(ish), and you will generally have two weeks to do assignment (this will vary) Problem sets require R and should be written in LaTex (must include tables, figures, and code within text) All problem sets will be posted on GitHub Evaluated by me or Martyn I will publish correct answers each week, but look at others’ GitHubs so we can learn from each other The lowest PS grade will be dropped, so “I have been so busy with other classes” is not a legitimate reason to not turn in problem set! 8 53
  10. Exams (50%) In class: February, 27 Exam is cumulative Exam is multiple choice and open-response questions Exam is graded by me, instructor You will be allowed a formula sheet Make-up exams are only allowed by written approval 9 53
  11. Replication (25%): Example Does Having Daughters Cause Judges to Rule for Women’s Issues? Table: Number of Children and Girls for U.S. Courts of Appeals Judges Participating in Gender-Related Cases, 1996-2002 1 Number of Children 0 1 2 3 4 5 6 7 8 9 Democrat 12 13 33 24 15 4 0 1 0 1 Republican 13 8 44 30 15 7 3 0 1 0 Number of girls 0 1 2 3 4 5 6 7 8 9 Democrat 26 35 29 10 1 2 - - - - Republican 36 43 31 9 2 0 - - - - 1 Does Having Daughters Cause Judges to Rule for Women’s Issues? Adam Glynn & Maya Sen (AJPS, 2015) 10 53
  12. Data: Daughters Number of Girls by Partisan Leaning Quantity Conditional Percent 0 1 2 3 4 5 1 0 0.0 0.2 0.4 0.6 0.8 1.0 Democrat Republican 11 53
  13. Data: Judge Demography Table: Demographics of U.S. Court of Appeal Judges who voted on gender-related cases (1996-2002) All Democrats Republicans Women Men Mean No. Children 2.47 2.40 2.54 1.58 2.66 Mean No. Girls 1.24 1.33 1.16 0.71 1.34 0 children 0.11 0.12 0.11 0.29 0.08 1 children 0.09 0.13 0.07 0.21 0.07 2 children 0.34 0.32 0.36 0.26 0.36 3 children 0.24 0.23 0.25 0.13 0.26 4 children 0.13 0.15 0.12 0.08 0.15 5 Children or More 0.08 0.06 0.09 0.03 0.05 Proportion Female 0.17 0.26 0.09 1.00 0.00 Proportion Republican 0.54 0.00 1.00 0.29 0.59 Proportion White 0.91 0.78 0.99 0.93 0.91 Mean Year Born 1932.55 1931.23 1933.43 1938.57 1931.49 12 53
  14. Data: Judge Demography Data: Judge Demography 0.00 0.25 0.50 0.75 1.00 Democrat Republican Percent of Party Percent of Female Judges by Party Affiliation 13 53
  15. Data: Cases Table: Distribution of the number of gender-related cases heard per judge, 1996-2002 Min. 1st Qu. Median Mean 3rd Qu. Max. All Judges 1.00 5.00 8.00 11.10 14.00 46.00 Democrats 1.00 5.00 7.00 10.12 13.00 39.00 Republicans 1.00 5.00 9.00 11.94 14.00 46.00 14 53
  16. Data: Cases 0.0 0.5 1.0 Proportion of Cases Decided in a Feminist Direction Less Feminist More Feminist Republicans Democrats All 15 53
  17. Model Predict the probability that a judge will vote in a feminist direction in any given gender-related case Pr(yi = 1) = logit−1 (β0 + βkXi) yi: judge-level votes in individual cases Xi: vector of individual-level predictors Main covariate of interest is # of biological daughters, conditioned on total # of children (categorical variable) 16 53
  18. Model: Base comparison 1 # R code presented in AJPS online replication f i l e s 2 base_model <− z e l i g ( progressive . vote ~ as . factor ( g i r l s ) 3 + as . factor ( child ) , model = " l o g i t " , 4 data = subset (women. cases , child < 5 & child > 0) ) 5 summary( base_model ) # number of observations = 1974 Dependent variable: progressive.vote as.factor(girls)1 0.384∗∗∗ (0.128) Observations 1,974 Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 17 53
  19. Model: Casewise delete 1 ## case −wise delete to get same n in primary model ## 2 # subset for judges with at least 1 child , 3 # but less than 5 (n=2448) 4 women. subset <− women. cases [women. cases$ child < 5 5 & women. cases$ child > 0 ,] 6 # subset for those judges who have g i r l s (n=1975) 7 women. subset <− women. subset [ complete . cases (women. subset$ g i r l s ) , ] 8 # subset for with progressive . vote value ( minus NAs ) 9 # (n=1974) 10 women. subset <−women. subset [ complete . cases ( 11 women. subset$progressive . vote ) , ] 18 53
  20. Model: Casewise delete (same results) 1 # re −run with subsetted data 2 case_delete <− glm ( progressive . vote ~ as . factor ( g i r l s ) 3 + as . factor ( child ) , family =binomial ( link = " l o g i t " ) , 4 data = women. subset ) 5 summary( case_delete ) # number of observations = 1974 Table: Re-run with subset Dependent variable: progressive.vote as.factor(girls)1 0.384∗∗∗ (0.128) Observations 1,974 Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 19 53
  21. Diagnostics: Basic model 1 ### check Pearson residuals ### 2 sum( residuals ( base_model , type= " pearson " ) ^2) 3 # check deviance 4 pchisq ( deviance ( base_model ) , df . residual ( base_model ) , 5 lower=F ) 6 # resulting p−value = 0 , not good model f i t 20 53
  22. Diagnostics: Basic model 1 # get preds for a l l subjects and take inverse l o g i t 2 predValues <− i l o g i t ( predict ( base_model ) ) 3 # mean l i k e of judge voting for p l a i n t i f f 4 mean( predValues ) # ( original mean l i b vote = 0.433) 5 # create binary outcome of benign and malignant 6 predBinary <− i f e l s e ( predValues > 0.433 , 1 , 0) 7 # create table to show predictive error 8 table ( observeBinary , predBinary ) 9 #1109/1974 (56.2%) accurately predicted Table: Estimation of judge below or above mean lib vote 0 1 0 858 333 1 532 251 21 53
  23. Missingness: Multiple imputation 22 53
  24. Missingness: Multiple imputation Reported in article: number of observations = 1507, β̂=0.42 (0.15) Table: Estimated coefficient with multiple imputation, model 4 iterations (m) seed 50 75 100 1234 0.257 (0.131) 0.225 (0.128) 0.241 (0.130) 1 0.220 (0.129) 0.221 (0.131) 0.232 (0.133) 555 0.213 (0.130) 0.218 (0.129) 0.233 (0.131) 1989 0.227 (0.131) 0.229 (0.135) 0.249 (0.132) 23 53
  25. Other course policies to consider Absences for religious holidays are excused Talk to me ASAP if you have any illness or family emergencies All students with special accommodations should notify me as soon as possible I Documentation from the Trinity Office of Disability Services is required The schedule posted on the syllabus is tentative and subject to change 24 53
  26. Reminder: Approach Toward Learning Preparation + synthesis + practice = learning Individual preparedness: Reading and review lectures before class In class: I Discussion and Q&A on important concepts I Tutorial: Advanced theoretical problems Office hours: Review and correct mistakes Problem sets: Individual homework assignments Exam and replication: Showcase knowledge 25 53
  27. Review: Last Term Final 26 53
  28. This term: Extending Modelling & Estimation What is a model? How do we estimate its parameters? What are the properties of estimator? Use what we learned, extend to non-continuous outcomes 27 53
  29. Social Science and Parametric Models Goal of social science is parsimonious explanation of social phenomena I Parsimonious because we can never explain every detail I Explanation because we want more than mere description Compare with non-parametric approach 28 53
  30. Non-parametric regression smoother 2021 2022 0.26 0.28 0.30 0.32 2021−2022 Sinn Fein polls with a lowess nonparametric fit Date Percent of Respondents that Support Party 29 53
  31. How did I do that in R? 1 # load data 2 polling _data <− read . csv ( " https : //raw . githubusercontent . com/ASDS− TCD/ S t a t s I I _Spring2023/main/datasets/long_ I P I . csv " ) 3 attach ( polling _data ) 4 Date <− as . Date ( Date , "%d/% m/%Y" ) 5 # open up non−parametric plot 6 pdf ( " . . /graphics/nonparameter_example . pdf " , width =9.25) 7 plot ( Date , SF , type= "n" , 8 main=" 2021−2022 Sinn Fein polls n with a lowess nonparametric f i t " , 9 xlab=" Date " , ylab=" Percent of Respondents that Support Party " ) 10 points ( Date , j i t t e r ( SF ) , pch =1 , cex =.6 , col =" red " ) 11 lines ( lowess ( SF ~ Date , f =1/ 10) , col = " blue " ) 12 abline (0 ,0) 13 dev . off ( ) 14 # Open an empty plot : type ="n" suppresses points and lines but 15 # scales axes correctly for the x and y variables . 16 # The "n" in the main t i t l e i s the line −break command to s p l i t 17 # the t i t l e across two lines . 18 # Plot the points , adding a l i t t l e noise to reduce overprinting of # data , use plot character 1 ( open c i r c l e ) , set the size 19 # to .6 of normal and color the points red 30 53
  32. Non-parametric Models: Virtues and Vices Benefits: Very flexible, can fit any pattern of data Makes minimal (virtually no) assumptions about data Can reveal unexpected patterns and departures from linear assumptions Drawbacks: Too flexible, sensitive to overfitting Without parameters there is no simple interpretation of effects Hard to incorporate substantive theory and tests 31 53
  33. A non-parametric future? A great deal of research on modern nonparametric methods is going on, lots of new developments But for social scientists, perhaps not the wave of the future The reason is that parametric models can do a lot for us 32 53
  34. What is a parametric model? We begin with specification of a specific distribution describing behaviour under study Specification requires theoretical understanding Specification also requires making assumptions explicit While this places a considerable burden on our theory, it forces us to confront limits of our knowledge and helps avoid making implicit and unwarranted assumptions Specification should make our assumptions clear to all, including ourselves 33 53
  35. Specification We are concerned with the estimation of parametric models of the form: yi ∼ f(θ, xi) where: θ is a vector of parameters xi is a vector of exogenous characteristics ofith observation The specific functional form, f, provides an almost unlimited choice of specific models 34 53
  36. Examples of specific models Poisson: yi ∼ f(k; λ) = Pr(X=k) = λke−λ k! where: k is # of occurrences (k = 0, 1, 2, . . . ) e is Euler’s number (e = 2.71828...) ! is the factorial function λ: Positive real number λ is equal to the expected value of X and also to its variance λ = E(X) = Var(X) 35 53
  37. Estimation: Maximum Likelihood Likelihood: proportional to probability of observing data, treating parameters of distribution as variables and data as fixed (and assuming independent observations) L(θ|Y) ∝ N Y i=1 p(Yi|θ) Maximum likelihood estimate is that value of parameter θ for which likelihood of observed sample is a maximum Alternatively, ML estimate is mode of likelihood function ML estimator turns out to have several useful properties, as we shall see 36 53
  38. We aren’t doing Bayesian inference! Big difference to what we’re doing with MLE! Given that y ∼ p(y, θ) how can we make inferences about value of θ? Bayes approach is reverse of probability problem: given θ what can we say about distribution of y I Sometimes called inverse probability Instead, we seek distribution p(θ|y), distribution of unknown parameter conditional on observed data 37 53
  39. Minimising least squares, assumptions? We required no assumptions about the distribution of y, x or u in order to compute least squares coefficients I If all we care about is fitting data, then we can stop here If we want to make inferences about θ, however, we need some more assumptions For example, what is relationship between θ̂ and θ in the population model? To this point, none whatsoever! I If we want to talk about θ, as opposed to θ̂, we need some more assumptions 38 53
  40. Reminder: Gauss-Markov Assumptions 1. yi = xiθ + ui 2. x is fixed and full rank (linear independence) 3. E(ui) = E(ū) = µu = 0 4. E(u2 i ) = σ2 5. E(ui, uj) = 0, ∀i 6= j 6. u ∼ normal 39 53
  41. Why make G-M assumptions? In order to do inference, we must say how data are generated (Assumptions 1 and 6) Must specify the parameterization of data generating process (Assumptions 1, 3–6) Must prove that the estimator has desirable properties. (1 is crucial while 3–6 are necessary for hypothesis testing 40 53
  42. What to notice about OLS Most of assumptions are about inherently unobservable term, ui Only assumption explicitly about yi is the first Specification strongly encourages us to think of ui as “error”, rather than intrinsic variability in outcomes, yi Key idea: Minimize sum of squared errors I This only indirectly considers the data generating process that creates the observed yi Properties of LS estimators come as an after-thought I We must derive them for each case as assumptions differ (think GLS vs. OLS, for example) 41 53
  43. New way of thinking: ML models Specification of distribution of outcome variable, this shift in focus is conceptual, but powerful ML requires an explicit choice of distribution I While some may be ruled out easily, final choice is inherently subjective and uncertain In defending our choices, we are forced to think through nature of data generating process, which is at core of our substantive theory 42 53
  44. Outcome Variable For least squares application, we wrote yi = xiθ + ui and u ∼ N(0, σ2 I) but we never said anything about distribution of yi! Seems odd, we have more substantive knowledge about yi than we can possibly have about unobservable ui 43 53
  45. Outcome Variable Implicitly, however, is that we have said something about distribution of yi Because ui is normally distributed, and because yi is a linear combination of ui, (and xθ, and since x is fixed) we can conclude that yi is also normally distributed Recall a theorem from intro stats: If u ∼ N(µ, σ2) and a, b are constants, then a linear function of u v = a + bu is normal also v ∼ N(µ + a, b2 σ2 ) 44 53
  46. Outcome Variable We could write the usual linear model as yi ∼ N(µi, σ2 ) and µi = xiθ Now notice that least squares model and this model are exactly same thing Hence we can express usual OLS model as an equivalent ML model by focusing on distribution of yi and data generating process, rather than on minimizing squared error 45 53
  47. Specifying Data Generating Process Specification of distribution of y is most crucial and controversial step in maximum likelihood modeling First, it is a decision which is to some extent subjective Second, it matters a lot for results (especially predicted values) 46 53
  48. Ex: Parliamentary Committees A reasonable question to ask about parliamentary committees is how ’productive’ they are # of bills voted out of a congressional committee gives some hints as to an appropriate distribution Do committees vary in how much legislation they process? I How do we model this? 47 53
  49. Ex: Parliamentary Committees # of bills is discrete and non-negative I This means we can rule out any distribution which is either continuous or which allows negative values Are there any systematic features that account for this variation? How do historical changes in committee rules or structure affect productivity within and between committees? Thus, normal distribution cannot be a candidate for describing this process, since normal is defined over real number line from −∞ to +∞ 48 53
  50. Ex: Parliamentary Committees Binomial, for example might be one candidate The committee considers N bills each session From these it reports y bills out If probability of reporting each bill is p, then probability model is yi ∼ n k pk (1 − p)n−k where I k successes I n independent Bernoulli trials I n k = n! k!(n−k)! 49 53
  51. Ex: Parliamentary Committees But are bills really limited to N? If we think supply of bills is effectively unlimited, because MEPs will find bills to sponsor if there is slack in system, then we might wish to model the process as... a poisson distribution yi ∼ λke−λ k! This distribution is also discrete and non-negative 50 53
  52. Issues with MLE Choice of a particular distribution is not always clear I Yet that choice must be made, for without it there is no model Some criticize ML for this I They point to the subjective and somewhat arbitrary choice of distribution, and to fact that if you pick wrong distribution you are estimating a misspecified model This is a lot of assumptions, and perhaps social science theory is not up to challenge I This is a valid concern I In an ideal world, we would have better knowledge of appropriate distribution and would not have so much discretion 51 53
  53. Necessary Choices in Applied Stats Don’t delude ourselves into thinking that there’s an escape from these dilemmas of statistical modeling Any statistical model must specify both structure and distribution of its variables Those who rely on OLS then are actually doing ML but are assuming that every model is a continuous, normal, model Surely it is preferable to adopt most persuasive ML specification even if it is subjective, than to always adopt this particular ML regression model regardless of substance of problem! 52 53
  54. Class business Read required (and suggested) online materials Fork GitHub repository These slides are available on the course website Next time, we’ll talk about GLMs! 53 / 53
Anzeige