Are you a Data Scientist working in the Real Estate industry? Are you trying to build a Data Science team with expertise in spatial?
We bring the London Real Estate Data Science community together to discuss use cases such as whitespace analysis, twin area analysis & indoor analytics - sharing best practices and experiences from both residential and commercial.
Jaime Sanchez walked through a specific case of Spatial Data Science applied to Shared Workspace investment analysis, with an interactive component, before we break out into a discussion about the challenges and opportunities of building Data Science teams in the Real Estate sector.
Geolytix joined the conversation to speak about location planning. As trusted advisors, they help their customers decide how many stores, who to acquire, where to open, which format and how to
optimize home delivery and click & collect operations.
Visit our website for more information: https://carto.com/
4. Which use
cases are
you typically
focused on?
Investment
Analysis
Indoor
Mapping
Site Planning Market Analysis
Trade Area
Analysis
Pricing
Optimization
11. Discovering data useful
for their analysis
Evaluating and
purchasing data
ETLing the data
into common
structures
Analyzing, doing
feature extraction and
modeling
30% 30% 20% 20%
Where do SDS spend time?
12. A Data Scientist needed demographics and zip code data for Portugal to perform a
particular market analysis:
An example:
13. 80%
of participants believe that it is
difficult or very difficult to hire Data
Scientists with expertise in spatial
analysis
1. Strong background in statistics
2. Extensive experience in coding skills relating to Data
Science (Spark, SQL, Python, R, Tensorflow, Pytorch)
3. Experience developing production-quality data products
using the results of quantitative research
4. Extensive experience in data visualization (in Python and R
or other applications)
5. Effective application of Data Science workflows to
business problems, and the ability to storytell around
results
6. Familiarity with data pipelines and ETL practices (Airflow,
scheduled notebooks, Google DataFlow, etc.)
7. Familiarity with neural networks and deep learning (e.g.
Tensorflow, PyTorch)
8. Experience working with distributed computing systems
like Spark or Google BigQuery
9. Experience working with GIS software such as CARTO,
QGIS, or ArcGIS
14. 47% of participants do not find it challenging to
identify the right software & data to support
Spatial Data Science projects
How difficult
Is it to find the
right software
and data?
15. How will
investment in
Spatial Data
Science
initiatives
expand?
68% of organizations
are likely to increase
their investment in
Spatial Data Science in
the next 2 years
21. The Sum of Our Parts
The Complete Journey
As an organization, we have defined 5 steps that, together, create a
holistic Location Intelligence approach.
Our goal is to empower organizations as they traverse each of these 5
steps.
22. Spatial analysis in 5 key steps:
Data Ingestion
&
Management
Data
Enrichment
Analysis Solutions &
Visualization
Integration
Clean, geocode
and visualize your
data.
Clustering, outliers
analysis, time series
predictions, and
geospatial weighted
regression, change
spatial support
Using 3rd party
datasets — ideally
on standardized
spatial
aggregations to
reduce your time
to insight.
WebGL for big
datasets,
dashboards,
widgets, apps.
Productize model
into a web service
API.
Feed back into
Data Warehouse,
LoB systems,
consumers, others.
23. 1. Data Ingestion & Management
● Spatial database with multiple ways to connect
and manipulate your data
● Dynamic data in the cloud and multiple data
sources: local and remote files, cloud storages,
other databases, and more
● Fully managed database with automatic backups
and regular upgrades
● Enterprise data sharing and access across CARTO
Wide support for geospatial formats (inc. Shapefiles, KML, KMZ, GeoJSON,
GPX, OSM, GeoPackage, GDB, CSV, Excel or OpenDocument).
Plug ready database connectors (ArcGIS Server, DB Connectors via APIs
(MySQL, PostgresSQL, Microsoft SQL Server, Hive on request)).
24. 2. Data Enrichment
● Save time in gathering spatial data, augmenting
your existing data with new location data
streams from across the globe
● Create locations from addresses and
understand travel time all from within CARTO
● Develop robust ETL processes and update
mechanisms so your data is always enriched
● Premium data to understand and analyze
deeper trends and behavior
25. 3. Analysis
● Bring maps and data into your Data Science
workflows and the Python data science
ecosystem with CARTOframes
● Machine learning embedded in CARTO as
simple SQL calls for clustering, outliers analysis,
time series predictions, and geospatial
weighted regression
● Use the power of PostGIS and our APIs to
productionalize analysis workflows in your
CARTO platform
26. 4. Solutions & Visualization
● Develop and build custom applications with a
full suite of frontend libraries.
● Work with CARTO’s Professional Services and
Support team as and when you need it.
● Create lightweight, intuitive dashboards for
simple sharing of insights across your
organization.
27. 5. Integration
● Using CARTO’s APIs and SDKs, connect your
analysis into the places that matter most for
you and your team.
● Bring CARTO to other data destinations, such as
desktop GIS and BI tools.
● Embed CARTO inside other tools, such as
Salesforce Einstein Analytics or Qlik Sense.
● Work with our Professional Services team for
custom configurations or developments.
29. How can we analyze and
understand real estate sales
in Los Angeles?
30. Pains
1. “Disconnected experiences to consume data - it is broken into
separate tools, teams DBs, excels.”
2. “Limited developer time in our team.”
3. “Current data science workflow doesn’t have a geo focus. and Spatial
modeling is cumbersome because I have to export results to XYZ
tool in order to visualize and test my model effectively.”
4. “Having trouble handling and visualizing big datasets.“
31. Outline the Process
1. Integrate spatial data of past home sales and property locations
in Los Angeles county
2. Enrich the data with a spatial context using a variety of relevant
resources (demographics, mastercard transactions, OSM)
3. Clean and analyze the data, and create a predictive model for
homes that have not sold
4. Present the results in a Location Intelligence solution for users
5. Integrate and deploy the model into current workflows for day
to day use
33. Integrate LA Housing Data
The Los Angeles County Assessor's office provides two different datasets
which we can use for this analysis:
● All Property Parcels in Los Angeles County for Record Year 2018
● All Property Sales from 2017 to Present
38. CREATE TABLE la_join AS
SELECT s.*,
p.zipcode as zipcode_p,
p.taxratearea_city,
p.ain as ain_p,
p.rollyear,
p.taxratearea,
p.assessorid,
p.propertylocation,
p.propertytype,
p.propertyusecode,
p.generalusetype,
p.specificusetype,
p.specificusedetail1,
p.specificusedetail2,
p.totbuildingdatalines,
p.yearbuilt as yearbuilt_p,
p.effectiveyearbuilt,
p.sqftmain,
p.bedrooms as bedrooms_p,
p.bathrooms as bathrooms_p,
p.units,
p.recordingdate,
p.landvalue,
p.landbaseyear,
p.improvementvalue,
p.impbaseyear,
p.the_geom as centroid
FROM sales_parcels s
LEFT JOIN assessor_parcels_data_2018 p ON s.ain::numeric = p.ain
Clean and join the data on unique
identifier using SQL
40. Integrate LA Housing Data
Next we want to add spatial context to our housing data to
understand more about the areas around:
● Demographics
● Mastercard (Scores and Merchants) (Nearest 5 Areas)
● Nearby Grocery Stores and Restaurants
● Proximity to Roads
42. Mastercard
Find the merchants and sales/growth scores in the five nearest block
groups to the home via Mastercard Retail Location Insights data
43. (
SELECT AVG(sales_metro_score)
FROM (
SELECT sales_metro_score
FROM mc_blocks
ORDER BY la_eval_clean.the_geom <-> mc_blocks.the_geom
LIMIT 5
) a
) as sale_metro_score_knn,
(
SELECT AVG(growth_metro_score)
FROM (
SELECT growth_metro_score
FROM mc_blocks
ORDER BY la_eval_clean.the_geom <-> mc_blocks.the_geom
LIMIT 5
) a
) as growth_metro_score_knn
45. (
SELECT count(restaurants_la.*)
FROM restaurants_la
WHERE ST_DWithin(
ST_Centroid(la_eval_clean.the_geom_webmercator),
restaurants_la.the_geom_webmercator,
1609 / cos(radians(ST_y(ST_Centroid(la_eval_clean.the_geom)))))
) as restaurants,
(
SELECT count(grocery_la.*)
FROM grocery_la
WHERE ST_DWithin(
ST_Centroid(la_eval_clean.the_geom_webmercator),
grocery_la.the_geom_webmercator,
1609 / cos(radians(ST_y(ST_Centroid(la_eval_clean.the_geom)))))
) as grocery_stores
46. Roads
See if a home is within one mile of a major highway or trunk highway
using the SQL API and major roads from OpenStreetMap.
47. (
SELECT CASE WHEN COUNT(la_roads.*) > 0 THEN 1 ELSE 0 END
FROM la_roads
WHERE ST_DWithin(
la_eval_clean.the_geom_webmercator,
la_roads.the_geom_webmercator,
1609 / cos(radians(ST_y(ST_Centroid(la_eval_clean.the_geom)))))
AND highway in ('motorway', 'trunk')
) as highways_in_1mile
49. Analysis
The analysis for this project followed the following steps:
● Moran’s I Clusters & Outliers (Exploratory Data Analysis)
● Neighbor Homes Analysis (Spatial Feature Engineering)
● Predictive Modeling & Hyperparameter Tuning (using XGBoost)
50. Moran’s I
Using Moran’s I to evaluate spatial clusters and outliers via the PySAL
package, we can see these groupings and visualize them in
CARTOframes.
51. The Sum of Our PartsThe Sum of Our Parts
Moran’s I
52. The Sum of Our Parts
Neighbor Analysis
Evaluate the attributes of
neighbor properties using
k-nearest neighbor spatial
weights in PySAL to perform
spatial feature engineering.
53. The Sum of Our Parts
how the attributes of your neighbors influence the price of your home and spatial
context…
55. The Sum of Our Parts
Predictive Modeling
Using XGBoost we can use this data to create a regression model to
predict housing prices and push that data back to CARTO using
CARTOframes, never leaving the notebook environment.
56. The Sum of Our PartsThe Sum of Our Parts
Sale
Price Past Sales
Spatial Data Enrichment
Spatial Modeling
Analyze the values of nearest neighbor sales,
clusters of high Mastercard areas, proximity to
features
Train & Test Model
Predictions
Spatial Feature Engineering
57. The Sum of Our Parts
Predictive Modeling
After hyperparameter tuning the model, we can reduce the Mean
Average Error down to $58,179.78.
60. The Sum of Our Parts
Solutions
To present the data and predictive analysis, both on data from the
model that has a sales price and for homes that have not sold, we
can develop a location intelligence application to showcase these
results.
63. Application Development
Deploy the model via a Python
based API and sync to data to
perform on the fly predictions
for specific properties.
64. The Sum of Our Parts
Other Use Cases
● Predicting revenue from different physical retail locations
● Identify clusters and groups of specific patterns to optimize
activities such as sales outreach or site selection
● Classify property types or buying patterns in a city
● Review spatial feature importance for site performance, and
modify models using different spatial components
● find areas with similar behavioral patterns
65. Similarity Analysis
We built a model to identify areas with similar
behavior patterns based on footfall, socio economic
and financial data and more. The similarity score is
modeled based on:
● Distance between cells is calculated with a L2
norm on a Principal Component space.
● Uncertainty due to missing values and
dimension of PC space is tackled following an
ensemble probabilistic approach.
● Similarity Score = Continuous Rank
Probability Skill Score.
By enriching the data with other sources this model
can be used for Site Planning, Investment Analysis,
etc.
74. Network strategy
Location planning
Omnichannel analysis
Spatial modelling
Our whole business is about location planning. As trusted
advisors we help our customers decide how many stores,
who to acquire, where to open, which format and how to
optimise home delivery and click & collect operations.
Team of 36 location
specialists to work
collaboratively with your
business
Led in-house location planning for major global retailers.
Experts in spatial modelling, forecasting, web development
and systems.
Create innovative new datasets for local markets.
Growing to a global
company
Offices in London, Leeds, Warsaw, Dortmund, Shanghai,
Tokyo and Melbourne
INTRODUCTION
83. REAL ESTATE
FOUNDATIONAL FEATURES
• Decisions are complex and outcomes only become
clear over years
• Choices are multi-faceted and driven by dynamic
competing interests
• Key information is tightly held
• The amounts of money involved are vast
• Decisions are hard to undo
• “Retailers make few decisions that are as
permanent and unforgiving as selecting store
locations.”
84. SOME HISTORY
SPATIAL DATA SCIENCE
• William Playfair – 1780s
• Charles Minard – 1830s
• John Snow – 1850s
• Charles Booth – 1890s
• Roger Tomlinson – 1960s
• Arthur Samuel – 1950s
• David Huff – 1970s
85. SOME ISSUES
UNDERPINNING STATISTICAL ISSUES
Samples are not randomly drawn from the variable space
Items from within the sample influence each other
Hardly any variables are normally distributed
These three features fatally wound pretty much every standard statistical approach
WHAT DATA SCIENCE CAN DO
Describe things
Classify stuff
Predict responses
What we really want to do is to predict the future
86. THE GP ANALOGUE
Diagnose the
business problem
Bring in the specialists if you need them
(e.g. algorithm/model creation)
Communicate to business stakeholders
Support decision making
88. WHAT IS MACHINE LEARNING?
● “Machine Learning” refers to the field of study that gives computers the ability to learn without
being explicitly programmed (Samuel, 1959)
● In practical terms, a series of different algorithms can be applied to detect patterns in data
(including big data), which can lead to actionable insights
● Common machine learning applications include (not extensively):
● Regression Forecasting (e.g. sales forecasts)
● Classification or Clustering (e.g. segmentation or image classification)
● Association Rule Learning (interesting relations; e.g. which other products are you likely to
buy based on your other purchases?)
● Reinforcement Learning (e.g. chess AI)
89. IT’S NOT NEW
● Machine learning is not a new concept… In 1952
Arthur Samuel wrote the first computer program
which learned as it ran
● First neural network to solve a real world problem
was designed in 1959 (an adaptive filter to remove
echoes from phone lines)
● So if ML isn’t new, why is it becoming so popular
now?
91. COMMON TYPES OF MACHINE LEARNING ALGORITHMS
● Supervised Learning:
● The user (human) teaches the algorithm by providing it with input data and a sample of
result data (e.g. x = input features, y = actual sales)
● The algorithm then attempts to learn from the input data how best to predict a result (e.g.
predict sales)
● Unsupervised Learning:
● The computer is trained with unlabelled data; there is no teacher
● This family of machine learning algorithms is useful for pattern recognition and rule detection
● Semi-supervised Learning:
● A combination of supervised and unsupervised methods
● Reinforcement Learning:
● Maximises reward and minimises risk, iteratively learning from the environment
● Determines ideal behaviour within specific contexts
92. USE CASE FOR WITHIN LOCATION PLANNING
● So what’s the catch!?
● How can we use this in location planning/real estate!?
•
•
•
•
•
•
•
94. EXAMPLES OF USE CASES
● Using K-nearest neighbour to create demographic segmentations, based on known customer
data
● Learning about key drivers of success by examining feature importance
● Building forecasting models to predict sales based on property location
● Using NLP for categorising customer comments
● Etc.
One of the more interesting solutions we’ve used recently combines traditional methods with
machine learning…
95. GROCERY GRAVITY MODEL
● Gravity models are common practice within the grocery retail location planning/real estate place
● It is important for grocers to understand which locations would be ideal for a new supermarket,
but also to understand the impact this might have on existing locations and competitors…
Gravity Model in a nutshell:
● Based on theory of gravity
● More attractive destinations have a greater ‘pull’
● Attraction is linked to distance
● Using customer data we know how far people
actually travel to their chosen stores
● Fundamental concept is logical, and simple to
understand
96. GROCERY GRAVITY MODEL
● Gravity models are often very accurate at estimating
customer patterns and interactions at close range…
● However, this accuracy usually wanes as you try to
model sales from further afield:
● Consumers decisions are much harder to
understand
● Consumers have more choice
● Are they workers or residents?
● Decision is not as simple as “I’ll just pop
into my nearest, most attractive
supermarket”…
97. GROCERY GRAVITY MODEL
● The solution… To use machine learning to create an
estimate for ‘Sales beyond 30mins’
● Created a datamart for each property in the portfolio (see
opposite) and tested various machine learning algorithms
to see if we could more accurately predict sales than
previously
● Eventually settled on a neural network
It’s not as easily interpretable, but gives better results
on interactions which are inherently difficult to
understand anyway!
√ +20% of store sales beyond 30 minutes drivetime were
more accurately predicted
√ R² increased by 0.25 for beyond sales
100. EXAMPLE FASHION CLIENT
OBJECTIVE
With six stores operating in Hong
Kong, Dr. Martens wanted to
understand how high the achievable
turnover is at each location.
Additionally, an understanding of the
best locations for new stores was
required as part of a future store
investment roadmap.
RESULT
We created new datasets and a
bespoke model to calculate sales
potentials for the existing store
network. The model was then used in
a future opportunity scan to identify
the best locations for new stores in
Hong Kong.
STRATEGY
MODEL DEMAND: Calculate how
much people spend on footwear at the
lowest possible geography
MAP THE RETAIL LANDSCAPE:
Understand the locations where
retailers cluster in Hong Kong
SALES POTENTIAL: Calculate how
much turnover is achievable at a retail
venue (e.g. Mall) and individual store
level
OPPORTUNITY SCAN: Use the
developed model and data sets to find
ideal locations for the next Dr. Martens
stores in Hong Kong
101. STRATEGY
• Understand the true drivers of store performance &
the impact on nearby stores of opening new sites.
• Predict new store sales and cannibalisation using a
consistent, transparent fact base and model.
• Improve the efficiency of the store forecasting
process to allow more time for the value-add.
• Deliver the ideal network blueprint and optimum
network strategy.
EXAMPLE F&B CLIENT
"Our work with GEOLYTIX has enabled us to form a
consistent approach to new site forecasting, step
changing our understanding of customers catchments
and improving our ability to understand regional and
store performance. The collaborative approach has
resulted in us being able to make decisions around our
future location strategy and form ideal network
blueprints with significantly increased confidence.”
Craig Donnellan, Head of Location Planning
Dominos Pizza.
OBJECTIVE
Support the Dominos strategy to be the number one
pizza company in each neighbourhood with a focus on
franchisee profitability.
RESULT
102. EXAMPLE RETAIL CLIENT: FOOD, FASHION & HOME
OBJECTIVE
Support a step change in the roll-out of the Food estate,
understand the drivers of performance for the Clothing &
Home estate and recommend the optimum network
blueprint.
RESULT
“GEOLYTIX have worked with us to create a bespoke
toolset enabling us to proactively set our strategy and
quickly answer any What if scenarios. Their analysis and
recommendations have provided us with a consistent
evidence base from which to make our network
decisions."
STRATEGY
• Create an efficient selection & sales forecasting
process, based on a rigorous, objective fact base and
a consistent approach.
• Understand the drivers and catchments of the
Clothing & Home estate, in order to build optimal
networks.
• Integrate custom models with existing data and
software to create the M&S modelling toolkit.
• Bulk run multiple national and regional scenarios to
guide network strategy and create future blueprints.
103. EXAMPLE REAL ESTATE ADVIOR PROJECT
OBJECTIVE
Data / analytical support in evaluating potential acquisition opportunities and ongoing asset
management of retail assets.
STRATEGY
• Creation of town centre & grocery gravity models to
asses:
• Catchment profiles and fit to various potential
new occupiers
• Impacts of new greenfield developments and
centre remodels
• Existing retailer chain performance and potential
‘best next’ opportunities
• Ad-hoc consultancy support
• Assisting with major M&A and liquidity event
support
• Detailed asset reports including site visits to
support redevelopment
RESULT
• We provide access to our data and models through
a desktop GIS reporting tool which allows for:
• Ad hoc area demographic reporting
• Retail presence and chain list reports
• Analogue tool to find similar locations
• Drive time reporting
• Bespoke ‘client ready’ site and area reports
104. WHY GEOLYTIX
World Class Modelling. We have delivered optimisation models for many of the most successful organisations
in the world, across multiple sectors.
Innovation. The Queen’s Award for Innovation reflects our passion for being on the leading edge of new data,
technology, and ideas.
Practical Senior-Level Experience. We are practical operators, with Director-level experience in property teams
of some of the UK’s largest companies.
Technical Expertise. We are experienced data scientists, sales forecasting modellers and spatial web
application developers, and will build a bespoke solution. Every element of our solution, from the analysis to the
platform, will be specifically designed to meet your specific requirements.
We are global. Accounting for the often vast differences in structure, maturity and data availability and quality we
are able to apply a consistent approach across territories in order to support fact-based decisions.
Proven Track Record. We have delivered similar solutions many times before. We will deliver to spec, to time,
and to a fixed budget.
Genuine Partnership. Our commitment is to work closely with you through to deployment, and maintain the
support and relationship beyond.