An overview of data science in the social sector, and how to use your data science skills for good.
Includes tips and tricks from winning data science competitions on www.drivendata.org.
5. When
it
comes
to
data,
nonprofits
don’t
know
what
they
don’t
know.
6. THE DATA CAPACITY GAP
• McKinsey predicts
a 140k – 180k
shortage of data
scientists
• Average salary of
data
scientist:$118,709
• Average salary of
the Executive
Director of a
nonprofit (budget
$500k – $5m):
• $133,000
9. “Finding ways to make big data useful to
humanitarian decision makers is one of
the great challenges and opportunities of
the network age.”
-‐ UN Office for the Coordination of
Humanitarian Affairs
11. The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2014
Brew, Loewi, Majumdar, Reece, Rozier.
Buildings: 197,157
Time: 76 years
Money: $98 million
Buildings: 42,695
Time: 16.4 years
Money: $21.3 million
Buildings: 378
Time: 2 months
Money: $189,000
Prediction Saves Time & Money
No Prediction Current Model Model Forecast
Lead Paint Inspections
23. Using the winning algorithm, Boston
could catch the same number of
health violations with 40 percent
fewer inspections, simply by better
targeting city resources at what
appear to be dirty-kitchen hotspots.
- Mike Luca, Harvard Business School
25. Lots of Labels!
PETRO-VEND FUEL AND FLUIDS
MAINT MATERIALS
SATELLITE COOK
UPPER EARLY INTERVENTION PROGRAM
Regional Playoff Hosts
Supp.- Materials
ITEMGH EXTENDED DAY
FURNITURE AND FIXTURES
NON-CAPITALIZED AV
Water and Sewage *
Instructional Materials
Food Services - Other Costs
Capital Assets - Locally Defined Groupings
28. Text features in scikit-‐learn
from sklearn.feature_extraction.text import CountVectorizer
vec = CountVectorizer()
vectorized_data = vec.fit_transform(text_data)
29. Processing: Tokenize on Punctuation
PETRO-VEND FUEL AND FLUIDS
PETRO VEND FUEL AND FLUIDS
PETRO-
VEND
FUEL AND FLUIDS