Keynote given at Carto Spatial Data Science conference (https://carto.com/spatial-data-conference/), December 1, 2017.
How and why is location intelligence important at WeWork? A global provider of flexible work space, WeWork is growing at an incredible pace. With 178 locations in 53 cities and 18 countries, and doubling each year, and taking on decade+ leases, it is crucial that we take site selection seriously. In this talk, we’ll cover our initial efforts to bring geospatial thinking, tooling, and predictive models to the organization. We’ll cover some use cases, models, as well as some of the challenges of tackling geocentric data science at global scale.
13. What’s a good city, neighborhood, or location?
What are the relevant metrics, trends?
What are the firmographics for Bangkok?
How do we determine and measure market potential and saturation?
When will be too late to enter a given market?
How do we determine comparable markets?
Questions, Questions
23. Caveat: you have to define an area not a point location to attach a segment.
There is no definitive segment for a location as it is a function of scale of the area being considered.
Portland
Zoomed out
coarser-grained
Portland
Zoomed in
finer-grained
500m radius around each location was the
smallest radius in which Esri would assign
a segment to each location (smaller radii
led to missing data).
There is no correct scale. Nevertheless, we
find some interesting findings.
Segments Vary With Scale
24. Enriched 140 WeWork US
locations with 500m and 1km
radius.
Almost all were in handful of
67 segments.
Goal:
understand distribution of segments on
current fleet
Value:
● Filter
● Score
Esri Tapestry Segmentation
26. Score all ZIPs Opportunities:
Where is there no coworking but there should be?
Where is there coworking but there should not be?
Train on balanced dataset Interpretable models: Decision trees, Logistic, NaiveBayes, kNN
Enrich data Demographics, HHV, firmographics, daytime population...
All Coworking spaces in US 3,600 Coworking spaces
Aggregate by ZIP ~36,000 ZIP codes
Test on unbalanced dataset Optimize for precision not accuracy or F1
Filter by population density ~16,000 Coworking spaces
Goal:
Predict which ZIP codes should
have a coworking space, and why
Value:
● List of ZIPs we should
investigate
● Relative importance of
features
● Validation of current fleet
Coworking Model
27. Comps
Proforma: document set out case for a new location, including projected financial
performance
Given new Building, which are most similar “comps” in our fleet?
Predict “detrended occupancy”
All WeWork locations open >6 mo
Enrich data
Mine combinations of 1-10 features
kNN with leave one out cross-validation
Output ranked table for each WeWork
Goal:
Provide a ranked list of comparable
WeWork locations given some
non-WeWork location. Which
features are important?
Value:
● Better predictions
● Insights into drivers
28. Comps
Building Neighborhood Market
Building # of floors # of businesses within Xm # of colleges within MSA
Building total sq ft Distance to other WeWorks Undergrad, grad enrollment
Year built How far can one walk, bike, drive in
X seconds
Population who commute by
car, walk
Building Class #businesses of different sizes Household income
Building Rating Per capita income House values by ZIP
WW #floors, #offices,
#desks
Walk, bike, transit score Daytime population
Goal:
Provide a ranked list of comparable
WeWork locations given some
non-WeWork location. Which
features are important?
Value:
● Better predictions
● Insights into drivers
29. NY
LA
SF
DC
Goal:
Provide a ranked list of
comparable WeWork locations
given some non-WeWork
location. Which features are
important?
Chord diagram of nearest
neighbor of each location
Comps
30. Goal:
Determine whether # of amenities in
different classes predict WeWork
location success
Value:
● Location Score / Feature
● Key categories?
All WeWork locations open >6 mo
List of all US storefront businesses
For each business category &
distance, how predictive is that
category?
lm(occupancy ~ #chinese restaurants within 200m)
lm(occupancy ~ #pizza restaurants within 200m)
...
lm(gyms ~ #pizza restaurants within 800m)
Amenities
31. ● Score every reasonable location in U.S.
○ Location attributes
■ Across street: Western Union , Blue Bottle ?
■ Neighborhood: food, transport, river/parks, fitness...
● Output: simple thumbs up / down
● Value:
○ Heatmaps
○ Monitor locations (buildings, other businesses)
Goal:
score every urban intersection in US with
thumbs up / thumbs down, and understand why
Work In Progress
33. “I think the best technologies, and Twitter is included in this,
disappear. They fade into the background, and they’re relevant when
you want to use them, and they get out of the way when you don’t”
JACK DORSEY
2012 Charlie Rose interview