Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# Frontiers of Computational Journalism week 4 - Statistical Inference

86 Aufrufe

Veröffentlicht am

Taught at Columbia Journalism School, Fall 2018
Full syllabus and lecture videos at http://www.compjournalism.com/?p=218

Veröffentlicht in: Bildung
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

• Gehören Sie zu den Ersten, denen das gefällt!

### Frontiers of Computational Journalism week 4 - Statistical Inference

1. 1. Frontiers of Computational Journalism Columbia Journalism School Week 4: Quantification and Statistical Inference October 3, 2018
2. 2. This class • Quantification • Data Quality • Risk ratios • Regression • Causation • Interpretation
3. 3. Quantification
4. 4. Quantification x1 x2 x3 xN é ë ê ê ê ê ê ê ê ù û ú ú ú ú ú ú ú
5. 5. Different types of counting • Numeric o Continuous or discrete o Units of measurement? o Non-linear scales? • Categorical o finite, e.g. {true, false} o infinite e.g. {red, yellow, blue, ... chartreuse…} o ordered?
6. 6. Choices about what to count
7. 7. GDP = C + I + G + (X - M)
8. 8. 1940 U.S. census enumerator instructions
9. 9. 2010 U.S. census race and ethnicity questions
10. 10. Some things that are tricky to quantify, but usefully quantified anyway • Intelligence • Academic performance • Race, ethnicity, nationality, gender • Number of incidents of some type • Income • Political Ideology
11. 11. Data Quality
12. 12. Intentional or unintentional problems
13. 13. It looks like Lucknow and Kanpur have few traffic accidents, but deaths data suggests that accidents are not being counted. Lies and Statistics: How India’s Most-Populous State Fudges Crime Data, IndiaSpend
14. 14. Evaluating Data Quality Internal validity: check the data against itself • row counts (e.g. all 50 states?) • related data • histograms • do the numbers add up? External validity: compare the data to something else. • alternate data sources • expert knowledge • previous versions • common sense!
15. 15. Interview the Data • Who created this data? • What is this data supposed to count? • How was this data actually collected? • Does it really count what it’s suppose to? • For what purpose was this data collected? • How do we know it is complete? • If the data was collected from people, who was asked and how?
16. 16. • Who is going to look bad or lose money because of this data? • Is the data consistent with other sources? • Is the data consistent from day to day, or when collected by different people? • Who has already analyzed it? • Are there multiple versions? • Does this data have known problems? Interview the Data
17. 17. Risk ratios
18. 18. Deadly Force in Black and White, ProPublica 10/10/2014
19. 19. AP Clinton Foundation Story WASHINGTON (AP) — More than half the people outside the government who met with Hillary Clinton while she was secretary of state gave money — either personally or through companies or groups — to the Clinton Foundation. It’s an extraordinary proportion indicating her possible ethics challenges if elected president. At least 85 of 154 people from private interests who met or had phone conversations scheduled with Clinton while she led the State Department donated to her family charity or pledged commitments to its international programs, according to a review of State Department calendars released so far to The Associated Press. Combined, the 85 donors contributed as much as \$156 million. At least 40 donated more than \$100,000 each, and 20 gave more than \$1 million. - Many donors to Clinton Foundation met with her at State, AP, 8/24/2016
20. 20. Accident No Accident Blue Yellow
21. 21. Relative risk (risk ratio)
22. 22. AP Clinton Foundation Story “At least 85 of 154 people from private interests who met or had phone conversations scheduled with Clinton while she led the State Department donated to her family charity or pledged commitments to its international programs, according to a review of State Department calendars.” odds
23. 23. AP Clinton Foundation Story odds Not enough information to compute the odds ratio... which you can tell immediately because four values are required.
24. 24. Regression
25. 25. Speed Trap: Who gets a ticket, who gets a break? Boston Globe, 2004
26. 26. Speed Trap: Who gets a ticket, who gets a break? Boston Globe, 2004
27. 27. Speed Trap: Who gets a ticket, who gets a break? Boston Globe, 2004
28. 28. Nike Says Its \$250 Running Shoes Will Make You Run Much Faster, New York Times
29. 29. Surgeon Scorecard, ProPublica 2015
30. 30. ACR = adjusted complication rate (reported in story) Surgeon Scorecard methodology paper, ProPublica 2015
31. 31. Causal Models
32. 32. Does chocolate make you smarter?
33. 33. Occupational Group Smoking Mortality Farmers, foresters, and fisherman 77 84 Miners and quarrymen 137 116 Gas, coke and chemical makers 117 123 Glass and ceramics makers 94 128 Furnace, forge, foundry, and rolling mill 116 155 Electrical and electronics workers 102 101 Engineering and allied trades 111 118 Woodworkers 93 113 Leather workers 88 104 Textile workers 102 88 Clothing workers 91 104 Food, drink, and tobacco workers 104 129 Paper and printing workers 107 86 Makers of other products 112 96
34. 34. Does marriage make women safer?
35. 35. How correlation happens YX X causes Y YX Y causes X YX random chance! YX hidden variable causes X and Y YX Z causes X and Y Z
36. 36. Guns and firearm homicides? YX if you have a gun, you're going to use it YX if it's a dangerous neighborhood, you'll buy a gun YX the correlation is due to chance
37. 37. Beauty and responses YX telling a woman she's beautiful makes her respond less YX if a woman is beautiful, 1) she'll respond less 2) people will tell her that Z Beauty is a "confounding variable." The correlation is real, but you've misunderstood the causal structure.
38. 38. A causal network. From Statistical Modeling: A Fresh Approach
39. 39. What an experiment is: intervene in a network of causes