SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Mind the Gap
How holes in your data can
lead to stories

Thomas Hargrove, Scripps News Washington Bureau
Jennifer LaFleur, Center for Investigative Reporting
NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
•
•
•
•
•
•

Never assume data are whole – check !!!
Simple techniques like sorting
Many of these we use to integrity check
Graphing over time
Matching to other data sets
Statistical tools
•
•
•
•

Look for research already done on the topic
Find experts
Talk to reporters who have done similar stories
If possible, talk to records personnel who
assembled the data
• Follow data to their source – usually people
• Finding stories in the holes
– Agencies failure to report
– Varying reporting rules across geography or agency
– Government computer system failures
– Find patterns among missing records
– Find the reasons behind missing records
How This Project Started

Dr. David Icove
Researcher, University of Tennessee
Retired member of FBI Behavioral Science Unit
For many
years, NFIRS
reported
only 5%
of building
fires are
intentionally
set in U.S.
The Impossible Variance of America’s Rate of Arson: 2006 to 2011
•

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Department

State

Indianapolis
San Diego
New York City
Gwinnett County
Houston
Arlington
Chicago
Los Angeles City
Phoenix
Memphis
Tulsa
Gary
Cleveland
Toledo
Saginaw
Dayton
Buffalo
Youngstown
Highland Park
North Las Vegas

IN
CA
NY
GA
TX
TX
IL
CA
AZ
TN
OK
IN
OH
OH
MI
OH
NY
OH
MI
NV

Fires
1,207
1,022
18,988
1,678
7,740
1,511
5,075
7,975
5,359
5,331
3,076
424
5,742
2,544
1,377
1,930
1,606
2,125
748
435

Arson Rate
0%
0
1
2
2
3
4
10
12
16
22
28
28
28
32
33
33
36
45
49
How Rare is Arson?
But They Should Have Reported:
“Arson is grossly
under reported. The
true rate, I believe,
is 40% to 50% -- in
that range.”
--Bill Degnan,
President National
Association of State
Fire Marshals.
“There isn’t a day
that goes by that I
don’t think: ‘Man,
I was a monster.’
I’m just thankful
no one was hurt,”
--Kenneth Allen
Muncie, Indiana
The Allen Conspiracy:
46 people set 73 home and vehicle fires
to collect $3.8 million from insurance
Lessons Learned from 1 million fires:
• 54,860 fires at ‘unlucky’ buildings that, like
Allen’s home, experienced multiple fires but
none of which were reported as arson.
• 42,434 fires at buildings that experienced
foreclosure, according to the national
mortgage monitoring firm RealtyTrac.
• 3,561 fires that had multiple points of ignition,
suggesting someone set several fires at once.
• 77,596 fires in unoccupied or vacant buildings.
What’s Next?
• Collecting data on 4.8 million fires
• Calculate geographic rates by merging
aggregated fire counts to Census Bureau
tract data
• Correlate rates of suspicious fires to tracts
with unusually high occurrences of fire
• Contact local fire/police authorities to
determine if serial arson is suspected or
should be investigated
Local gap-mining stories
Here’s FBI data you were never supposed to see
Truck accidents by year and agency
Sometimes you find piles
Sometimes you find piles
Statistical tools
• Time series correlation – are your ups and
downs real?
• Project/predict data and compare to actual
results. What causes differences?
• Population counts are pretty accurate. Use
them to determine reporting rates
• Regression with dummy variables
Make sure the holes are real
EE000132 might actually be
the same as EE-000-132
A word of caution
• Do spot checks to make sure what you found
is real
• Run your findings by experts
• If possible, engage government sources of
data early. They may not be the enemy.
• Challenge your assumptions. Data are only a
clue, never an end results
Questions?
Jennifer LaFleur jlafleur@cironline.org @j_la28
Thomas Hargrove hargrovet@scripps.com
202-408-2703
Arson Project syntax files:
https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9
A copy of this presentation will be at www.jenster.com/2014

Weitere ähnliche Inhalte

Ähnlich wie Mind the Gap NICAR14 (holes in data)

Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRC
Emily Litle
 
Police Killings in America
Police Killings in AmericaPolice Killings in America
Police Killings in America
Maxwell Pederson
 
Data Visualization in the Newsroom
Data Visualization in the NewsroomData Visualization in the Newsroom
Data Visualization in the Newsroom
Carl V. Lewis
 

Ähnlich wie Mind the Gap NICAR14 (holes in data) (20)

ACP Digging Deeper
ACP Digging DeeperACP Digging Deeper
ACP Digging Deeper
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital Age
 
Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRC
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
Getting it the rightest
Getting it the rightestGetting it the rightest
Getting it the rightest
 
Ona 2012
Ona 2012Ona 2012
Ona 2012
 
GT1000 training IL
GT1000 training ILGT1000 training IL
GT1000 training IL
 
HumanityRoad training - Basic Crisis Information Management
HumanityRoad training - Basic Crisis Information ManagementHumanityRoad training - Basic Crisis Information Management
HumanityRoad training - Basic Crisis Information Management
 
Police Killings in America
Police Killings in AmericaPolice Killings in America
Police Killings in America
 
Data-driven enterprise off your beat - Aaron Mendelson - Fresno NewsTrain 4.2...
Data-driven enterprise off your beat - Aaron Mendelson - Fresno NewsTrain 4.2...Data-driven enterprise off your beat - Aaron Mendelson - Fresno NewsTrain 4.2...
Data-driven enterprise off your beat - Aaron Mendelson - Fresno NewsTrain 4.2...
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
 
Data Visualization in the Newsroom
Data Visualization in the NewsroomData Visualization in the Newsroom
Data Visualization in the Newsroom
 
Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013
 
Umhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handoutUmhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handout
 
Crim 4384 statistics
Crim 4384 statisticsCrim 4384 statistics
Crim 4384 statistics
 
Prepare to practice
Prepare to practicePrepare to practice
Prepare to practice
 
Developing a data mindset to improve stories every day - Aaron Mendelson - Fr...
Developing a data mindset to improve stories every day - Aaron Mendelson - Fr...Developing a data mindset to improve stories every day - Aaron Mendelson - Fr...
Developing a data mindset to improve stories every day - Aaron Mendelson - Fr...
 
Transparency ire13
Transparency ire13Transparency ire13
Transparency ire13
 
Investigating Disabiity Issues
Investigating Disabiity IssuesInvestigating Disabiity Issues
Investigating Disabiity Issues
 

Mehr von Jennifer LaFleur (6)

How drawing exercises your brain
How drawing exercises your brainHow drawing exercises your brain
How drawing exercises your brain
 
Brain flipping ire17
Brain flipping ire17Brain flipping ire17
Brain flipping ire17
 
Cats stats
Cats statsCats stats
Cats stats
 
ACP Getting the Goods
ACP Getting the GoodsACP Getting the Goods
ACP Getting the Goods
 
Nr14: Ten tips for data journalists
Nr14: Ten tips for data journalistsNr14: Ten tips for data journalists
Nr14: Ten tips for data journalists
 
VVOJ Intro to data journalism
VVOJ Intro to data journalismVVOJ Intro to data journalism
VVOJ Intro to data journalism
 

Mind the Gap NICAR14 (holes in data)

  • 1. Mind the Gap How holes in your data can lead to stories Thomas Hargrove, Scripps News Washington Bureau Jennifer LaFleur, Center for Investigative Reporting NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
  • 2. • • • • • • Never assume data are whole – check !!! Simple techniques like sorting Many of these we use to integrity check Graphing over time Matching to other data sets Statistical tools
  • 3. • • • • Look for research already done on the topic Find experts Talk to reporters who have done similar stories If possible, talk to records personnel who assembled the data • Follow data to their source – usually people
  • 4. • Finding stories in the holes – Agencies failure to report – Varying reporting rules across geography or agency – Government computer system failures – Find patterns among missing records – Find the reasons behind missing records
  • 5.
  • 6. How This Project Started Dr. David Icove Researcher, University of Tennessee Retired member of FBI Behavioral Science Unit
  • 7. For many years, NFIRS reported only 5% of building fires are intentionally set in U.S.
  • 8.
  • 9. The Impossible Variance of America’s Rate of Arson: 2006 to 2011 • • • • • • • • • • • • • • • • • • • • • • • Department State Indianapolis San Diego New York City Gwinnett County Houston Arlington Chicago Los Angeles City Phoenix Memphis Tulsa Gary Cleveland Toledo Saginaw Dayton Buffalo Youngstown Highland Park North Las Vegas IN CA NY GA TX TX IL CA AZ TN OK IN OH OH MI OH NY OH MI NV Fires 1,207 1,022 18,988 1,678 7,740 1,511 5,075 7,975 5,359 5,331 3,076 424 5,742 2,544 1,377 1,930 1,606 2,125 748 435 Arson Rate 0% 0 1 2 2 3 4 10 12 16 22 28 28 28 32 33 33 36 45 49
  • 10. How Rare is Arson?
  • 11. But They Should Have Reported:
  • 12. “Arson is grossly under reported. The true rate, I believe, is 40% to 50% -- in that range.” --Bill Degnan, President National Association of State Fire Marshals.
  • 13. “There isn’t a day that goes by that I don’t think: ‘Man, I was a monster.’ I’m just thankful no one was hurt,” --Kenneth Allen Muncie, Indiana
  • 14. The Allen Conspiracy: 46 people set 73 home and vehicle fires to collect $3.8 million from insurance
  • 15. Lessons Learned from 1 million fires: • 54,860 fires at ‘unlucky’ buildings that, like Allen’s home, experienced multiple fires but none of which were reported as arson. • 42,434 fires at buildings that experienced foreclosure, according to the national mortgage monitoring firm RealtyTrac. • 3,561 fires that had multiple points of ignition, suggesting someone set several fires at once. • 77,596 fires in unoccupied or vacant buildings.
  • 16. What’s Next? • Collecting data on 4.8 million fires • Calculate geographic rates by merging aggregated fire counts to Census Bureau tract data • Correlate rates of suspicious fires to tracts with unusually high occurrences of fire • Contact local fire/police authorities to determine if serial arson is suspected or should be investigated
  • 17.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Here’s FBI data you were never supposed to see
  • 25.
  • 26.
  • 27. Truck accidents by year and agency
  • 30.
  • 31. Statistical tools • Time series correlation – are your ups and downs real? • Project/predict data and compare to actual results. What causes differences? • Population counts are pretty accurate. Use them to determine reporting rates • Regression with dummy variables
  • 32.
  • 33. Make sure the holes are real EE000132 might actually be the same as EE-000-132
  • 34. A word of caution • Do spot checks to make sure what you found is real • Run your findings by experts • If possible, engage government sources of data early. They may not be the enemy. • Challenge your assumptions. Data are only a clue, never an end results
  • 35. Questions? Jennifer LaFleur jlafleur@cironline.org @j_la28 Thomas Hargrove hargrovet@scripps.com 202-408-2703 Arson Project syntax files: https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9 A copy of this presentation will be at www.jenster.com/2014