These slides are from a session given at the 2015 American Sociological Association annual conference in Chicago. The topic was how to critically think about the numbers we see in everyday life.
Visit to a blind student's school🧑🦯🧑🦯(community medicine)
Teaching Students How (Not) to Lie with Statistics
1. Teaching Students How (Not)
to Lie with Statistics
Lynette Hoelter
American Sociological Association
August 23, 2015
2. Presentation Outline:
• Statistics as social construction
• Questioning evidence
• Practice, practice, practice
• Ways stats can “catch” us
• Sources of “numbers” for practice
3. Numbers lend “authority”
• Make arguments seem more “scientific”
• Appears definitive
but, sometimes…
• Sources are given more credibility than they
should be (e.g., “Univ. of Michigan data suggest”
referring to results from a study of UM students)
• Key information needed to evaluate is missing
and/or numbers are taken out of context
4. Numbers as social construction
• Evidence is evidence, right?
• Numbers/statistics do not exist apart from
people
– Who counted?
– What exactly did they count?
– Why did they count it?
• Quantitative literacy is first step, then add
sociology (or vice versa)
5. Questions to ask upon sighting data1
• What is the source of the statement and/or
data?
• How is the information reported?
• Is the sample of adequate size and
representative?
1 Adapted from Healey, Joseph E., 2013. The Essentials of Statistics: A Tool for Social Research
(3rd Ed). Belmont, CA: Wadsworth, Cengage Learning.
6. We ALL need practice
• Using data in (any) class:
– Start class with data
– Tie survey data to topic of lecture
– Use real data as examples for problems or
exams
– Require evidence-based arguments
7. Easy Example:
EXTRA CREDIT: The charts below were part of a blog post by the Federal Reserve Bank of New York (9/2/2014)
and demonstrate two ways of looking at the value of a college degree. Net Present Value represents the
additional income earned by someone with a Bachelor’s degree compared to someone without, added over a
40+ year working life. In a couple of sentences, describe the trends in each chart and then answer the question:
Is a college degree worth it? Why or why not? (5 points)
8. Ways stats can “catch” us
• Definition issues
• Big numbers
• Proper measure of
central tendency
• Percent/percent
change
• Risks/Rates
• Correlation & causation
• Trends over time
• Statistical vs
substantive significance
• Funky graphics
• Reducing complexity of
social patterns
9. Definition Issues
• What was included, what was excluded?
• How was a “positive” defined?
• If looking at cost/benefits – really measuring
all costs/benefits? (Compare apples to apples)
• From whom were data collected (sampling)?
11. Definitions (con’t)
• Rates = fairly straightforward;
# of people to whom event happened
# for whom event was possible
• US Divorce Rate – commonly reported ~ 50%
• Numerator is easy (formal divorces?)
• Denominator??
– All current marriages
– All first marriages
– All marriages in one year
• Large differences by age at first marriage, number of previous
marriages, etc.
12. Definition of credit card fraud given on site: Credit card fraud is a theft
committed using a credit card or debit card, as a fraudulent source of funds in a
transaction. The purpose may be to obtain goods without paying, or to obtain
unauthorized funds from an account. According to the United States Federal
Trade Commission, while identity theft had been holding steady for the last few
years, it saw a 21 % increase in 2008.
No hint as to whether denominator includes all Americans,
Americans with credit cards, etc.
Source: www.statisticbrain.com/credit-card-fraud-statistics/
13. Big Numbers
• Shock value
• No context
• More memorable
– Deaths from flu 1976-2006 range from 3,000
to 49,000
– 49,000 is a lot, isn’t it?!
– 1,715,434 deaths in US in 2015 so far
14. Providing Context for Big Numbers
• Using seconds1:
– One million seconds ~ 11.6 days (86400 = day)
– One billion seconds ~ 31.5 years
• Using $$: $17 Trillion US Debt
• Population sizes2:
– 100,000 people ~ South Bend, IN
– 1,000,000 people ~ San Jose, CA or Austin, TX; Montana or Rhode
Island
– 10,000,000 people ~ North Carolina or Georgia
– US. Pop. = 320,145,187 (320 million)
– China Pop. = 1,393,783,836 (1.39 billion)
– World Pop. = 7,361,779,045 (7.36 billion)
1 Paulos, 2001 2US Census and Worldometers.com
15.
16. Central Tendency
• Plays on our understanding of “average”
• Distributions that are skewed should use
median
– E.g., “Average” household income in US, 2011
• Median: $50,502
• Mean: $69,821
17. Percent/Percent Change
• Beware of percentages in tables
– Make sure they add to 100% for the
independent variable
• Percent change
– Each calculation changes the base
– Why 50% Off sales are not the same as 20% off
and additional 30% off
21. Risks & Rates Risk of developing
breast cancer in
next 10 years goes
up by 230% from
age 30 to 40; 58%
from age 40-50.
From: http://www.cdc.gov/cancer/breast/statistics/age.htm
23. Trends (or “Trends”) over Time
• Legends of charts
• Time frame presented
can change
interpretation
• Changes in
defining/reporting
• Be wary of trends that
suddenly change
direction (life doesn’t
move that quickly)
24. Incidents were classified as school shootings
when a firearm was discharged inside a school
building or on school or campus grounds, as
documented by the press or confirmed through
further inquiries with law enforcement. Incidents
in which guns were brought into schools but not
fired, or were fired off school grounds after
having been possessed in schools, were not
included.
27. Simplifying Complex Processes
• Identifying one event/process/change as
affecting change in complex process
– E.g., “Broken Window” theory of crime
28. In Short:
• Get students thinking about numbers and
their context as early and often as possible
29. Websites to Start Your Search
• ABCNews Who’s Counting (Paulos’ column)
• Association of Religion Data Archives Learning
Center
• Choosing a Good Chart (decision table)
• Data360
• Gapminder
• ICPSR: Resources for Instructors
– Data-driven Learning Guides
• Pew Research Center: Fact Tank, Reports,
Datasets, Interactives
• Population Pyramids of the World
• Social Explorer: US mapping
• Social Science Data Analysis Network
• Spurious Correlations
• Statistic Brain
• Stats.org
• Survival Curve
• TeachingWithData.org
• Worldometers, USA Live Stats
• Public Opinion:
– Gallup Organization
– National Opinion Research Center (GSS
Explorer)
– Roper Center (iPoll)
• Government Centers such as the Census
(American FactFinder), NCES, or NCHS
• Professional Development:
– Science Education Resource Center
(Carleton College)
– TeachQR.org (Lehman College)
– Making Data Meaningful (United Nations
Economic Commission for Europe)
• International:
– UK Data Services Teaching with Data
– European Social Survey EduNet
30.
31. (A Few) Interesting Reads:
Best, Joel. 2012. Damned Lies and Statistics: Untangling Numbers from
the Media, Politicians, and Activists (2nd Ed). Berkeley: University of
California Press.
Best, Joel. 2004. More Damned Lies and Statistics: How Numbers Confuse
Public Issues. Berkeley: University of California Press.
Huff, Darrell. 1993. How to Lie With Statistics (2nd Ed). New York: W.W.
Norton & Company.
Klass, Gary. 2012. Just Plain Data Analysis: Finding, Presenting, and
Interpreting Social Science Data (2nd Ed). New York: Rowman & Littlefield
Publishers, Inc.
Paulos, John Allen. 2001. Innumeracy: Mathematical Illiteracy and Its
Consequences (2nd Ed). New York: Hill & Wang.
Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions
Fail – But Some Don’t. New York: Penguin Group (USA).
From Exam 1, SOC250L: Quantitative Applications in Sociology, Eastern Michigan University, Hoelter, Fall 2014.
Fox Graphic Claimed Government Spending Increased From 3.2 Percent Under Bush To An Average Of 23.8 Percent Under Obama. In a graphic labeled "Growth of Government Spending (As A Share Of GDP),"Fox & Friends suggested that government spending increased from 3.2 percent of the economy at the end of the Bush administration to an average of 23.8 percent under Obama.
[Fox News, Fox & Friends, 9/26/12, via Media Matters]
In Fact, Graphic Compared Two Completely Different Measures Of Government Spending. The figure for "government spending" during the Obama administration is in line with historical data for overall spending as a percentage of the economy, a figure that does not take into account federal revenue. By contrast, the 3.2 percent figure used to illustrate "government spending" under Bush and the figures for the 1940s are in line with historical data for deficits, which do take into account revenues. [Media Matters, 9/26/12]
Government Spending Under Obama Increased Only Slightly Since 2008 And Dropped Since 2009.The actual figures for government spending ("outlays") as a percentage of the economy would indicate that the number has increased only slightly since 2008 and actually dropped since 2009. They were 20.8 percent in 2008 but 25.2 percent in 2009. In 2010 and 2011, they dropped to 24.1 percent and are expected to be 24.3 in 2012 and 23.3 percent in 2013.
[Media Matters, 9/26/12]
A Few Days Later, Fox & Friends Admitted: "We Mixed Up The Numbers." On September 28, Fox & Friends addressed the dishonest chart. Guest co-host Eric Bolling stated: "We mixed up the numbers on Wednesday, so we wanted to clear things up." But Bolling did not explain how Fox made such an error or note that government spending as a percentage of the economy has actually increased only slightly since 2008. [Media Matters, 9/26/12]
Beware of tables percentaged the wrong way (always make sure “independent” variable adds to 100%)
Percent change: each change makes base number different from initial base
While true that 83.33% of the police force quit, it was 5 out of 6 people.
Large increases can be made on small N’s, so that the resulting N is still small.
Just because one thing comes first does not mean it causes the other – need to identify theory/mechanism by which it might (in addition to relationship and time order).
Washington Post – reporting results from study published in Demography. Good commentary: https://scatter.wordpress.com/2015/08/13/is-parenthood-really-worse-than-divorce-demographic-clickbait-in-the-washington-post/#more-9288
Each of these Websites has a number of different kinds of materials and might warrant some investigation. Brief descriptions are as follow:
Association of Religion Data Archives (ARDA) has a great collection of learning activities that include “compare yourself” surveys, map-based activities, and other exercises based on the religion surveys they archive.
ICPSR’s Data-driven Learning Guides (DDLGs) are self-contained exercises on a variety of topics ranging from attitudes about the environment to family relationships, to political behaviors in China. Resources for Instructors page also includes tool for creating crosstabulation tables for student use, longer modules containing multiple exercises.
Social Science Data Analysis Network (SSDAN) is the umbrella for a number of sites that include mapping activities (CensusScope) and analysis of subsets of variables (DataCounts!) all based on the US Census and American Community Survey. The exercises are created by faculty and are good examples of the kinds of things that are easily used in class.
TeachingWithData.org is a repository of materials (lesson plans, exercises, datasets, etc.) from many sources tagged with metadata to simplify searching and locating appropriate materials.
Pew Research Center reports on surveys they’ve conducted as well as other data presented in popular media. Their site contains exercises, quick facts, datasets, and summary reports. Topic coverage is very broad.
Data360 is a blog that includes all kinds of fun items as well as more typical data-based reports, such as a chart of wealth distribution within a variety of countries.
Worldometers is a fun site that gives facts related to government, demography, and things related to social environment and culture, broadly defined. It’s a great way to get students thinking about the world around them “by the numbers” and also is a way for them to gain a sense of large numbers. USA Live Stats does the same thing for the U.S. Numbers are updated in real time.
Population Pyramids is a good site for teaching international demography and demographic trends.
Social Explorer uses Census data to create interactive maps and tables. Some features are freely available, others require membership, but UM is a member. (Log in through the library page.)
Gapminder is good for demonstrating global changes over time in things like population size and wealth distribution.
Survival Curve is an interactive exercise that shows the chance of death before one’s next birthday based on a variety of demographic characteristics.
Spurious Correlations is a collection of relationships (strong correlations) discovered as part of a computer science project – computer trolling data.
Who’s Counting is written by John Allen Paulos – doesn’t seem to be currently updated, but still lots of great examples.
Gallup Organization is good for data and reports related to public opinion issues.
Roper Center Public Opinion Archives: Some free content, others member only, but UM is a member. Access through UM library page for iPoll, with search capabilities for 600,000 public opinion survey questions
Government offices – Both NCES and NCHS have “quick stats” or “fast facts” and American Factfinder is good if having students compare their hometown to nation or other characteristics.
Statistic Brain has all kinds of statistics broken into topic areas such as food, geographic, sports, crime, etc.
Stats.org is out of George Mason Univ and their goal is for people to understand the numbers behind the news, so variety of topics, current…
TeachQR.org – the Numeracy Infusion Course for Higher Education; a group at Lehman College has been working on quantitative reasoning instruction through projects funded by NSF and elsewhere. This is a great site for examples across different disciplines. More for professional development than classroom use.
The Science Education Resource Center is aimed primarily at faculty for professional development, but also includes example exercises with extensive data about the context of their use.
UK Data Services has teaching datasets and online analysis using NESSTAR, as well as exercises and other resources for instructors.
European Social Survey site has resources about both substantive topics and methodological issues such as weighting and regression. Great for exploring data comparatively or over time.