This webinar discusses empowering data scientists with geospatial data at scale. It introduces four keys to geo-enrichment: location determination, feature selection, data extraction, and business insight. It demonstrates how to simplify spatial processing and reduce data munging for data scientists. The webinar also includes a Kubernetes demo and discusses using human mobility data to improve customer behavior understanding.
2. Housekeeping
Webinar Audio
• Today’s webinar audio is streamed through your computer
speakers
• If you need technical assistance with the web interface or audio,
please refresh your browser window – Chrome is recommended
Questions Welcome
• Submit your questions at any time during the presentation using
the Q&A box
Recording and slides
• This webinar is being recorded. You will receive an email following
the webinar with a link to the recording and slides
3. Webinar goals
Spatial processing
Reduce LI
processing < 90%
Data science, AI, & ML
Reduce LI data
munging &
wrangling for Data
Scientists <80%
Simplify Spatial Data
• Using catchments
• 4 geo-enrichment
keys
• IDs & data design
• Demo
4. The value of human
mobility
Using human mobility dramatically
improves understanding of customer
behavior
How many driving minutes away are
your customers from the fire station,
urgent care center, coffee shop, and
gym?
5. Data science success?
• “July 2019: VentureBeat AI reports:
87% of data science projects never make it into production.
• Jan 2019: NewVantage survey reports:
77% of businesses report that "business adoption" of big data and AI initiatives
continues to represent a big challenge for business.
That means 3/4 of the software being built is apparently collecting dust.
(Ouch.)
• Jan 2019: Gartner says:
80% of analytics insights will not deliver business outcomes through 2022 and
80% of AI projects will “remain alchemy, run by wizards” through 2020.”
Brian T. O'Neill July 23, 2019 designingforanalytics.com
6. Location intelligence help
• Quickly ingest curated location
information to enrich millions of
customer records
• Automate how to value
location intelligence
information (what’s good vs
what’s not)
• Access valuable location
information for hundreds of
millions of customers with Big
Data and Cloud Native
technology
• Know the keys to big data
geo-enrichment
• How to use the quality
indicators provided in
metadata
• Create a successful
geo-location data integration
strategy for the best data
science value
• Increase the success of your
data science projects while
minimizing costs
7. • Operational
• Inserts location data
into automated
workflows
• Begins with
geocoded addresses
or positions
(latitude/longitude)
• Enrich the input data
with spatial
information related to
the location
• Helps production use
map analytics and
enables automation
Geo-enrichment
provides
business insight
8. Four geo-enrichment keys
Where?
Forward geocoding
Latitude/Longitude
Location ID
What’s nearby?
Points
Lines
Areas
What can be learned?
Attributes
Distances
Calculations
Comparisons
What Decisions?
Too risky?
How much?
Model input
AI & ML
Location
Determination
Feature
Selection Business InsightData Extraction
9. Products in the four geo-enrichment keys
• Spectrum
Geocoding for
Big Data
• PreciselyID
• *Geohash
• Spectrum Location
Intelligence for Big Data
• Spectrum Routing for Big
Data
• **Geohash
• Property valuation
• Insurance underwriting
• Network availability
• All products
Location
Determination
Feature
Selection Business InsightData Extraction
*Geohash from Spectrum Geocoding for Big Data planned for Q1 2021
**Geohash available in Location Intelligence for Big Data
10. Feature
selection
• Points
• Lines
• Areas
Point in polygon
Choose geographic
features near
defined locations
• Administrative boundaries
• Neighborhoods
• Service areas
• Drivetime
• Drive-distance
Distance selection
• Edge of area
• Linear features
• Points of interest
Location ID
Join geospatial data by:
• PreciselyID
• Geohash
11. Data Extraction
Extract information
from selected
features
Calculate & Measure
• Distance to high wildfire
risk area
• Count gas stations
within a 10-minute drive
Attributes
• County name
• Flood zone type
• POI business types
Combine & Compare
• In flood zone &
elevation change
• RF signal & POI
12. Pre-build pattern
Pre-build geo-enriched data (batch)
• Customer data
• Business data
• GIS/Map data
• Address lists
• Mobile trace
• Model data
• Four geo-enrichment keys
• Location ID key-value joins
• PreciselyID
• Geohash
Data lake or other data store
13. • Spatial join organized by a
location ID:
• PreciselyID
• Geohash
• Using LI and business data
• Customer data
• Third party data
• Transportation data
• Boundary data
• Points of Interest data
• Mobile trace data
Pre-build view
Street
Network
Admin
Boundaries
Mobile
Trace
Points of
Interest
Sales History
Demographics
Parcel
Boundaries
Building
Footprints
School Districts
Modeled Data
Customer
Addresses
Crime
14. Transactional pattern
Request based querying of the data (real-time)
Requests with
• Policy Addresses
• Lat/Long values
• Key-value assignment
• PreciselyID
• Geohash
• 10x performance using
pre-build
Data lake or data storageIs this address:
• Good for a MDU loan?
• In the tornado path?
• Impacted by > 1” hail?
• Able to use fixed wireless?
Geocoding
15. • Address level decisions
• Standardized customer
locations
• Duplicates
• Missing information
• Quality
• Spatial information from
location IDs
• Group by administrative,
demographic, and market
areas
Geo-enrichment
from geocoding
Parsed
Address
Latitude/L
ongitude
Address
Identifiers
Genealogy
PreciselyID
Address
Validation
Location
Quality
Street
Address
Range
Demographics
IDs
Geohash ID
Standardized
Address
Administrative
Boundaries IDs
16. • Determine property fire protection
risk
• Join by Geohash or PreciselyID
• Fire stations
• Street networks
• Municipal boundaries
• Township boundaries
• National address location list
• Drivetime boundaries
• Drive distance boundaries
• Traffic load information
• More flexibility for better risk
estimates that give better insurer
profitability
Fire risk
Fire
Stations
5-Minute Drive
Distance
10-Minute
Drive
Distance
AM Peak
Drivetimes
20-Minute
Drive
Distance
County
Boundaries
National
Address
List
Township
Boundaries
City
Boundaries
Off Peak
Drivetimes
Street
Network
PM Peak
Drivetimes
17. • Property valuations by considering
neighborhood factors by human
mobility
• Join by Geohash or PreciselyID
• Residential locations
• Business locations
• Population concentration
• Consumer behavior
• Administrative boundaries
• Drivetime boundaries
• Drive distance boundaries
• Human mobility provides detailed
tenant behavior tendencies for
better occupancy predictions
Mortgage value Single
Family
Homes Business
Locations
5-Mile Drive
Distance
10-Mile
Drive
Distance
Geo-Fences
County
Boundaries
Neighborhood
Boundaries
City
Boundaries
ZIP Code
Boundaries
Demographics
Multi-dwelling
Units
Geo-
Demographics
18. • Customer behavior data
correlated to mobile trace
• Join by Geohash or PreciselyID
• Mobile trace data sources
• Consumer home locations
• Business locations
• Consumer behavior
• Administrative boundaries
• Drivetime boundaries
• Drive distance boundaries
• Tying complex spatial data such as
mobile trace history, drive distance, and
drivetime creates an informed customer
profile which leads to better decisions
Human mobility Single
Family
Homes Business
Locations
5-Mile Drive
Distance
10-Mile
Drive
Distance
Mobile
Trace
Admin
Boundaries
Demographics
Subscriber
History
Drivetime
Boundaries
20-Mile Drive
Distance
Multi-dwelling
Units
Geo-
Demographics
20. Code samples
Lorum ipsum
• Lorem ipsum dolor sit
amet, cons ecte tu
radipiscing
• radipiscing elit, sed do
eiusmo det tempor
cons ecte
• incididunt ut labore et
dolore radipiscing
• Lorem ipsum dolor sit
amet, cons ecte
turadipiscing elit, sed
do eiusmo det tempor
• radipiscing elit, sed do
eiusmo det tempor
ecte tu radipiscing
Example code &
configurations
• GitHub
• Spark
• Docker
• Kubernetes
Big data samples
• Point in polygon
• Find nearest
• Aggregating with
geohash
• Multi-pass
geocoding
Cloud native
samples
• Forward geocoding
• Reverse geocoding
• Auto complete
• PreciselyID and
G-NAF look up
21. Empowering Data Scientists to Utilize Geospatial
Data at Scale Webinar Series
Topic 1 Topic 2 Topic 3
register now!View on-demand Today
Why do 87% of data science projects never make it into production? - https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
Volume, Variety & Velocity
Access to the data
From: https://designingforanalytics.com/resources/failure-rates-for-analytics-bi-iot-and-big-data-projects-85-yikes/
July 23, 2019 by Brian T. O'Neill
July 2019: VentureBeat AI reports 87% of data science projects never make it into production
Jan 2019: NewVantage survey reports 77% of businesses report that "business adoption" of big data and AI initiatives continues to represent a big challenge for business. That means 3/4 of the software being built is apparently collecting dust. Ouch.
Jan 2019: Gartner says 80% of analytics insights will not deliver business outcomes through 2022 and 80% of AI projects will “remain alchemy, run by wizards” through 2020.
Consider animating these for emphasis
Most large enterprises have made major investments in data environments over a period of many years
These environments contain the data that these business run on and that today power the strategic initiatives driving the business forward – machine learning, AI and predictive analytics
- Legacy platforms (mainframe and IBM i) continue to adapt with each new wave of technology and are not going away anytime soon
Highly responsive real time service that responds to load - elasticity
Add slide with detailed descriptions of POI by boundary, flood elevation and wildfire distance