HunchLab 2.0 Predictive Missions: Under the Hood

Missions: Under the Hood

340 N 12th St, Suite 402
Philadelphia, PA 19107
215.925.2600
info@azavea.com
www.hunchlab.com

Jeremy Heffner
HunchLab Product Manager
jheffner@azavea.com
215.701.7712

Amelia Longo
Business Development Associate
alongo@azavea.com
215.701.7715

Places
People
Patterns

}

Prioritization

It’s the fourth Tuesday in January and
school is in session. There were 3
burglaries and 2 robberies yesterday.
Six bars, three take-out stores, and a
school are in the neighborhood. The
forecast is 17° with cloudy skies.

Where do you focus your 2 vehicles?

Analyst Process
•  Identify relevant factors
–  Training / Literature
–  Experience

•  Use heuristics
– 
– 
– 
– 
– 

high concentration of past crime è higher risk
near a bar on a Friday night è higher risk
near the police station è lower risk
concentration of ex-offenders è higher risk
near transit stops è higher risk

term: machine learning
A computer system designed to learn how to
accomplish a task by using historic data sets.
There are different ways (algorithms) to
accomplish this training process.

term: algorithm
The step-by-step procedure to accomplish a
given calculation. Different algorithms have
different qualities. Algorithms are used to train
a machine learning model.

Overall Process
1.  Generate training examples of outcomes
2.  Enrich with relevant variables
3.  Build models
4.  Evaluate accuracy
5.  Select best performing model

~ 500 ft cells & 1+ hour time slices

Data Volume
•  Space
–  Lincoln, NE is 90 sq miles
–  500 ft cell size creates 12,000 cells

•  Time
–  3 years of data
–  1 hour resolution
–  26,000 hour blocks

•  Space x Time
–  312,000,000 hour block cells (examples)

Data Volume
•  Space
–  Lincoln, NE is 90 sq miles
–  500 ft cell size creates 12,000 cells

•  Time
–  3 years of data
–  1 hour resolution
–  26,000 hour blocks

•  Space x Time
–  312,000,000 hour block cells (examples)

•  Sampling FTW!
–  Outcomes are sparse (small % of examples have crimes)
–  Sampling strategy preserves crime events

Predictive Missions
•  Crime predictions based on:
–  Baseline crime levels
•  Similar to traditional hotspot maps

–  Near repeat patterns
•  Event recency (contagion)

–  Risk Terrain Modeling
•  Proximity and density of geographic features
•  Points, Lines, Polygons (bars, bus stops, etc.)

–  Collective Efficacy
•  Socioeconomic indicators (poverty, unemployment, etc.)

Predictive Missions
•  Crime predictions based on:
–  Routine Activity Theory
•  Offender: proximity and concentration of known offenders
•  Guardianship: police presence (AVL / GPS)
•  Targets: measures of exposure (population, parcels, vehicles)

–  Temporal cycles
•  Seasonality, time of month, day of week, time of day

–  Recurring temporal events
•  Holidays, sporting events, etc.

–  Weather
•  Temperature, precipitation

Representing Crime Theories
Risk Terrain Modeling

Gun
shoo)ngs
example

Source:
Rutgers,
h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm

crimes

prior7

prior364

dayssincelast

bardist

dow

0

0

0

365

>2000ft

Monday

0

0

1

234

>2000ft

Monday

1

1

3

3

750ft

Tuesday

0

0

2

43

500ft

Wednesday

2

0

2

74

500ft

Friday

Representing Crime Theories
Aoristic Analysis

crimes

probability

0

0

1

a

2

b

3

c

4

d

crimes

weights

prior7

prior364

0

1

0

0

1

0

dayssincelast

bardist

dow

0

365 >2000ft

Monday

0

1

234 >2000ft

Monday

0.5

1

3

3

750ft

Tuesday

1

0.5

1

3

3

750ft

Tuesday

0

0

0

2

43

500ft

Wednesday

0

0.13

0

2

74

500ft

Friday

1

0.32

0

2

74

500ft

Friday

2

0.55

0

2

74

500ft

Friday

Models
•  Baseline
–  Baseline models (6)
•  Counts
–  28 day
–  56 day
–  364 day

•  Kernel Densities
–  28 day
–  56 day
–  364 day

–  HunchLab models
•  Variations of a stacked ensemble:
–  examples è gradient boosting machine (gbm) è y/n probabilities
–  y/n probabilities è generalized additive model (gam) è counts

term: decision tree
A machine learning algorithm that recursively
partitions a data set based upon variable
values forming a tree-like structure.

term: gradient boosting machine (GBM)
A machine learning algorithm that uses a series
of weaker models (typically decision trees) that
are trained upon the residuals of prior iterations
(boosting) to form one stronger model.

1

Build
Decision
Tree 1

2

Build
Decision
Tree 2

3

Build
Decision
Tree 3

Predict
with 1

Predict
with 1 & 2

Predict
with 1-3

Calculate
errors

Calculate
errors

Calculate
errors

…

term: generalized additive model (GAM)
A regression model that fits smoothed functions to the
input variables. Compare to a generalized linear model
which fits just a single coefficient to each variable.

HunchLab Model Building
1.  Build a GBM
–  examples è gradient boosting machine è y/n probabilities

312 million

Sampling
4 million

4 folds
1 mil

1 mil

1 mil

1 mil

}
GBM

1 mil
Evaluate

43

200

312 million

Sampling
4 million

GBM

43

1.  Build a GBM
–  examples è gradient boosting machine è y/n probabilities
•  Segment examples into several folds
–  For each fold build a GBM model on the rest of the data
–  For each iteration in the GBMs:
»  Randomly sample a portion of the data (stochastic)
»  Adjust weights of observations (adaptive boosting)

•  Determine how many iterations result in the most accurate model
•  Build a GBM on all of the data for that many iterations

2.  Build a GAM
–  y/n probabilities è generalized additive model è counts
•  Transforms (“bends”) GBM output into counts
•  Calibrates count levels with other key variables

Selecting Models
1.  Build models holding out last 28 days of data
2.  Score each model
– 

Combine different metrics into a selection score

3.  Select best score
4.  Rebuild the best model (including last 28 days data)

A map represented as a grid of cells

Crime Location

100%

0%
Cells ranked highest to lowest

100%

0%
Cells ranked highest to lowest

Percent of Patrol Area to Capture All Crimes

Average Crime Rank

100%
50%

Percent of Crimes Captured vs. Percent of Patrol Area
0%
0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Assault

Burglary

MVT

Rape

Robbery

Percent
of
Patrol
Area
to
Capture
All
Crimes

0%

20%

40%

60%

80%

100%

Average
Crime
Rank

Assault
0

0.1

0.2

0.3

0.4

0.5

0.6

Burglary

MVT

Rape

Robbery

Theft of Motor Vehicle

Percent of Crimes Captured

0.4

0.2

0
0

0.02

0.04

0.06

0.08
Percent of Land Area

0.1

0.12

0.14

0.16

Our Solution
•  Learns from several years of your data
•  Automatically determines which theories apply
–  more than just crime data

•  Prevents over-fitting
•  Calibrates predictions
•  Selects a model based upon a blind evaluation
–  prioritization and count-based metrics

Our Solution
•  Learns from several years of your data
•  Automatically determines which theories apply
–  more than just crime data

•  Prevents over-fitting
•  Calibrates predictions
•  Selects a model based upon a blind evaluation
–  prioritization and count-based metrics

•  But it still cannot make your morning coffee

Additional Information
•  How did HunchLab originate?
•  How does HunchLab represent crime theories?
•  What data is needed?
•  How does the modeling work specifically?

Questions

215.925.2600
info@azavea.com

www.hunchlab.com

Jeremy Heffner
HunchLab Product Manager
jheffner@azavea.com
215.701.7712
Amelia Longo
Business Development Associate
alongo@azavea.com
215.701.7715

215.925.2600
info@azavea.com

www.hunchlab.com

HunchLab 2.0 Predictive Missions: Under the Hood

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to HunchLab 2.0 Predictive Missions: Under the Hood

Similar to HunchLab 2.0 Predictive Missions: Under the Hood (20)

More from Azavea

More from Azavea (16)

Recently uploaded

Recently uploaded (20)

HunchLab 2.0 Predictive Missions: Under the Hood