SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Spatial Search with Geohashes

David Smiley, MITRE, dsmiley@mitre.org, October 2010




                                                       1
What I Will Cover
•   My requirements
•   What is a geohash?
•   Geohash prefix boundaries
•   My Solr geohash prefix indexing strategy
•   Analysis, conclusion, and the future
My Background
       • Solr book author
       • Solr instructor
         – (at MITRE)
       • Working at MITRE
         for 10+ years
         – Supporting internal
           apps and US DOD
           sponsors
My Geospatial Requirements
• Documents have multiple points
• Filter search results
  – No relevancy (i.e. ranking, sort) needed
     • Using proprietary MetaCarta plugin separately
       for geo-relevancy ranking
• Lat-Lon bounding box

• Environment Considerations:
  – Large distributed Solr index
  – Low point cardinality
  – Point detail is course – a few miles
Spatial Search in Solr
• Nothing officially available yet
• SOLR-773, SOLR-1568
   “Incorporate Local Lucene/Solr”
   – Only point-radius, no bounding lat-lon box
   – Geohash to/from lat-lon (but that’s it)
• JTeam Spatial Solr (fork of SOLR-773)
• MetaCarta GeoSearch Toolkit for Solr (alpha)
   – Steep RAM requirements when only geo filtering
• Brad Giaccio (attached to SOLR-773)
   – A good start; could be much faster
Geohashes
• What is a Geohash?
  – A lat/lon geocode system
  – Has a hierarchical spatial structure
  – Gradual precision degradation
  – In the public domain
  http://en.wikipedia.org/wiki/Geohash

• Example: (Boston) DRT2Y
Geohash Decoding
DRT2Y
        • Translate base-32 dictionary to binary
              Decimal 0   1   2   3   4   5   6   7   8   9   10 11 12 13 14 15
              Base 32 0   1   2   3   4   5   6   7   8   9   b   c   d   e   f   g
              Decimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

              Base 32 h   j   k   m   n   p   q   r   s   t   u   v   w   x   y   z



          01100-10111-11001-00010-11110
          5 bits needed for each original geohash character
        • Separate even (longitude) & odd (latitude)
          bits
          0100110101110 (13 bits, longitude)
          101111000011 (12 bits, latitude)
          Depending on geohash length, the latitude bit count
            is either one less or equal to that of longitude
Geohash Decoding (cont.d)
DRT2Y   Latitude example
           bit       min       mid      max      val      err
            1        -90.000    0.000   90.000   45.000   45.000
            0          0.000   45.000   90.000   22.500   22.500
            1          0.000   22.500   45.000   33.750   11.250
            1        22.500    33.750   45.000   39.375    5.625
            1        33.750    39.375   45.000   42.188    2.813
            1        39.375    42.188   45.000   43.594    1.406
            0        42.188    43.594   45.000   42.891    0.703
            0        42.188    42.891   43.594   42.539    0.352
            1        42.188    42.539   42.891   42.363    0.176
            0        42.188    42.363   42.539   42.275    0.088
            0        42.188    42.275   42.363   42.319    0.044
            1        42.275    42.319   42.363   42.341    0.022
Demo
http://openlocation.org/geohash/geohash-js/
Zooming In: D




                10
Zooming In: DR




                 11
Zooming In: DRT




                  12
Zooming In: DRT2




                   13
Zooming In: DRT2Y




                    14
Geohash Grids
Internal coordinates of an odd length geohash…
DRT2Y                                …and an even length geohash
                                     DRT2
Proximity with Geohashes
• Points with a long common prefix are near
  each other
  – But the converse is not always true!
     • i.e.: Points near each other share a long
       common prefix                             T
     • Edges cases exist at every level:
         – D R T….
         – D R M…
         – (only share 2 letters)
                                                M
  – However geohashes form a hierarchical grid
Filtering by Lat-Lon Box


                        User Query




                        Geohash
                       Resolution: 5




                                   17
Indexing Strategy Overview
• Index geohashes
• Develop query support to specify lat-lon box
• Choose a good geohash resolution for the
  box
• Lookup set of overlapping geohashes
• Develop a Lucene Filter:
  – Seek to each overlapping geohash in termsEnum
  – Skip term (geohash) if not in desired box
  – Accumulate matching documents
Indexing: Index Geohashes
• Provided by SOLR-1586
• solr.GeoField
  – Input/output is latitude-comma-longitude, indexed
    internally as a geohash


• See also: GeoHashUtils.java
  – Only bidirectional lat-lon conversion routines
  – Needed code to navigate to adjacent geohashes.
    (35 L.O.C.)
Indexing: Query Support
• Modified SOLR-1586:
  {!sfilt fl=body
      ll=42.358,-71.059 ur=43.000,-70.000}
  (~15 L.O.C.)
Indexing: Determine Resolution
double[] lookupDegreesSizeForHashLen(int hashLen)
    (14 L.O.C.)
int estimateGoodPrefixLen(lat or lon, degrees)
    (9 L.O.C.)
Indexing: Lookup Overlapping
                Geohashes
String[] calcPrefixHashesToFilter(double north, south, east, west)
    (45 L.O.C.)
Indexing: Develop Lucene Filter
• Implement Lucene Filter getDocIdSet():
  – Call calcGeohashPrefixesToFilter()
  – Seek to each overlapping geohash in termsEnum
  – Skip term (geohash) if not in desired box
  – Accumulate matching documents
  (45 L.O.C.)
• Alter solr.GeoHashField createSpatialQuery()
  to use the filter
  (5 L.O.C.)
Future Improvements
• Lookup table based geohash adjacency
• Avoid isInBox() calculation for geohashes
  known to be completely within the query
• Index multi-precision geohashes
  – DRT2Y: DRT2Y, DRT2, DRT, DR, D
  – Reduces / avoids term enumerating, isInBox()
• Use different geohash-like encoding
  – See javageomodel project
References
      Pre-Existing Solr Spatial Search
• SOLR-773, SOLR-1568
• JTeam Spatial Solr Plugin
  http://www.jteam.nl/news/spatialsolr.html

• Brad Giaccio, of ManTech
  https://issues.apache.org/jira/browse/SOLR-773 (solrGeoQuery.tar)

• MetaCarta GeoSearch Toolkit for Solr
  http://berlinbuzzwords.wdfiles.com/local--files/links-to-slides/goodwin_bbuzz2010.pdf
Know more about Spacial search at
    www.lucidimagination.com

Weitere ähnliche Inhalte

Was ist angesagt?

Geopy Module in Python
Geopy Module in PythonGeopy Module in Python
Geopy Module in PythonRabinaTwayana
 
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksBeginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksJinTaek Seo
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
Marble Virtual Globe 1.4 Factsheet (English)
Marble Virtual Globe 1.4 Factsheet (English)Marble Virtual Globe 1.4 Factsheet (English)
Marble Virtual Globe 1.4 Factsheet (English)Marble Virtual Globe
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4Andrea Antonello
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchElasticsearch
 
02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developer02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developerAndrea Antonello
 
Marble Virtual Globe 1.6 Factsheet (English)
Marble Virtual Globe 1.6 Factsheet (English)Marble Virtual Globe 1.6 Factsheet (English)
Marble Virtual Globe 1.6 Factsheet (English)Marble Virtual Globe
 
Marble - ein Schweizer Taschenmesser für Karten
Marble - ein Schweizer Taschenmesser für KartenMarble - ein Schweizer Taschenmesser für Karten
Marble - ein Schweizer Taschenmesser für KartenMarble Virtual Globe
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft ShadowsWolfgang Engel
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in rRichard Wamalwa
 
Better Visualization of Trips through Agglomerative Clustering
Better Visualization of  Trips through     Agglomerative ClusteringBetter Visualization of  Trips through     Agglomerative Clustering
Better Visualization of Trips through Agglomerative ClusteringAnbarasan S
 

Was ist angesagt? (17)

Why is postgis awesome?
Why is postgis awesome?Why is postgis awesome?
Why is postgis awesome?
 
Geopy Module in Python
Geopy Module in PythonGeopy Module in Python
Geopy Module in Python
 
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksBeginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
Marble Virtual Globe 1.4 Factsheet (English)
Marble Virtual Globe 1.4 Factsheet (English)Marble Virtual Globe 1.4 Factsheet (English)
Marble Virtual Globe 1.4 Factsheet (English)
 
Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in Elasticsearch
 
Vgg
VggVgg
Vgg
 
Post gispguk
Post gispgukPost gispguk
Post gispguk
 
02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developer02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developer
 
Marble Virtual Globe 1.6 Factsheet (English)
Marble Virtual Globe 1.6 Factsheet (English)Marble Virtual Globe 1.6 Factsheet (English)
Marble Virtual Globe 1.6 Factsheet (English)
 
Marble - ein Schweizer Taschenmesser für Karten
Marble - ein Schweizer Taschenmesser für KartenMarble - ein Schweizer Taschenmesser für Karten
Marble - ein Schweizer Taschenmesser für Karten
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft Shadows
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in r
 
Better Visualization of Trips through Agglomerative Clustering
Better Visualization of  Trips through     Agglomerative ClusteringBetter Visualization of  Trips through     Agglomerative Clustering
Better Visualization of Trips through Agglomerative Clustering
 

Andere mochten auch

Geohash in mapping applications
Geohash in mapping applicationsGeohash in mapping applications
Geohash in mapping applicationsAlex Tumanoff
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataDataCards
 
Data Modeling with Cassandra
Data Modeling with CassandraData Modeling with Cassandra
Data Modeling with CassandraPatricia Gorla
 
Public Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationPublic Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationTaha Kass-Hout, MD, MS
 
BioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceBioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceTaha Kass-Hout, MD, MS
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemTaha Kass-Hout, MD, MS
 
Latest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceLatest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceSteve Ma
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
Matchinguu droidcon presentation
Matchinguu droidcon presentationMatchinguu droidcon presentation
Matchinguu droidcon presentationDroidcon Berlin
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Taha Kass-Hout, MD, MS
 
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리BJ Jang
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Shalin Hai-Jew
 
Writing and Publishing about Applied Technologies in Tech Journals and Books
Writing and Publishing about Applied Technologies in Tech Journals and BooksWriting and Publishing about Applied Technologies in Tech Journals and Books
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
 

Andere mochten auch (20)

Geohash
GeohashGeohash
Geohash
 
Geohash in mapping applications
Geohash in mapping applicationsGeohash in mapping applications
Geohash in mapping applications
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial Data
 
Data Modeling with Cassandra
Data Modeling with CassandraData Modeling with Cassandra
Data Modeling with Cassandra
 
Public Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationPublic Health Surveillance Through Collaboration
Public Health Surveillance Through Collaboration
 
BioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceBioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 Conference
 
BioSense 2.0
BioSense 2.0BioSense 2.0
BioSense 2.0
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response System
 
Big Data in Public Health
Big Data in Public HealthBig Data in Public Health
Big Data in Public Health
 
Social Media for the Meta-Leader
Social Media for the Meta-LeaderSocial Media for the Meta-Leader
Social Media for the Meta-Leader
 
precisionFDA
precisionFDAprecisionFDA
precisionFDA
 
Latest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceLatest Advances in Megapixel Surveillance
Latest Advances in Megapixel Surveillance
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Matchinguu droidcon presentation
Matchinguu droidcon presentationMatchinguu droidcon presentation
Matchinguu droidcon presentation
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
 
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
 
Epi Info™ Mesh4x
Epi Info™ Mesh4xEpi Info™ Mesh4x
Epi Info™ Mesh4x
 
Writing and Publishing about Applied Technologies in Tech Journals and Books
Writing and Publishing about Applied Technologies in Tech Journals and BooksWriting and Publishing about Applied Technologies in Tech Journals and Books
Writing and Publishing about Applied Technologies in Tech Journals and Books
 

Ähnlich wie Spatial search with geohashes

Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresKostis Kyzirakos
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebKostis Kyzirakos
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...Dongpo Deng
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newchizhangufl
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Geoindexing with MongoDB
Geoindexing with MongoDBGeoindexing with MongoDB
Geoindexing with MongoDBleafnode
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsSymeon Papadopoulos
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucidworks
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 
RasterFrames - FOSS4G NA 2018
RasterFrames - FOSS4G NA 2018RasterFrames - FOSS4G NA 2018
RasterFrames - FOSS4G NA 2018Simeon Fitch
 
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningRasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningAstraea, Inc.
 
Drupal mapping modules
Drupal mapping modulesDrupal mapping modules
Drupal mapping modulesPatrick Hayes
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseAndrew Eisenberg
 

Ähnlich wie Spatial search with geohashes (20)

Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic Web
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
9553477.ppt
9553477.ppt9553477.ppt
9553477.ppt
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Geoindexing with MongoDB
Geoindexing with MongoDBGeoindexing with MongoDB
Geoindexing with MongoDB
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web Applications
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 
RasterFrames - FOSS4G NA 2018
RasterFrames - FOSS4G NA 2018RasterFrames - FOSS4G NA 2018
RasterFrames - FOSS4G NA 2018
 
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningRasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 
Drupal mapping modules
Drupal mapping modulesDrupal mapping modules
Drupal mapping modules
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-Eclipse
 

Mehr von Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Spatial search with geohashes

  • 1. Spatial Search with Geohashes David Smiley, MITRE, dsmiley@mitre.org, October 2010 1
  • 2. What I Will Cover • My requirements • What is a geohash? • Geohash prefix boundaries • My Solr geohash prefix indexing strategy • Analysis, conclusion, and the future
  • 3. My Background • Solr book author • Solr instructor – (at MITRE) • Working at MITRE for 10+ years – Supporting internal apps and US DOD sponsors
  • 4. My Geospatial Requirements • Documents have multiple points • Filter search results – No relevancy (i.e. ranking, sort) needed • Using proprietary MetaCarta plugin separately for geo-relevancy ranking • Lat-Lon bounding box • Environment Considerations: – Large distributed Solr index – Low point cardinality – Point detail is course – a few miles
  • 5. Spatial Search in Solr • Nothing officially available yet • SOLR-773, SOLR-1568 “Incorporate Local Lucene/Solr” – Only point-radius, no bounding lat-lon box – Geohash to/from lat-lon (but that’s it) • JTeam Spatial Solr (fork of SOLR-773) • MetaCarta GeoSearch Toolkit for Solr (alpha) – Steep RAM requirements when only geo filtering • Brad Giaccio (attached to SOLR-773) – A good start; could be much faster
  • 6. Geohashes • What is a Geohash? – A lat/lon geocode system – Has a hierarchical spatial structure – Gradual precision degradation – In the public domain http://en.wikipedia.org/wiki/Geohash • Example: (Boston) DRT2Y
  • 7. Geohash Decoding DRT2Y • Translate base-32 dictionary to binary Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Base 32 0 1 2 3 4 5 6 7 8 9 b c d e f g Decimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Base 32 h j k m n p q r s t u v w x y z 01100-10111-11001-00010-11110 5 bits needed for each original geohash character • Separate even (longitude) & odd (latitude) bits 0100110101110 (13 bits, longitude) 101111000011 (12 bits, latitude) Depending on geohash length, the latitude bit count is either one less or equal to that of longitude
  • 8. Geohash Decoding (cont.d) DRT2Y Latitude example bit min mid max val err 1 -90.000 0.000 90.000 45.000 45.000 0 0.000 45.000 90.000 22.500 22.500 1 0.000 22.500 45.000 33.750 11.250 1 22.500 33.750 45.000 39.375 5.625 1 33.750 39.375 45.000 42.188 2.813 1 39.375 42.188 45.000 43.594 1.406 0 42.188 43.594 45.000 42.891 0.703 0 42.188 42.891 43.594 42.539 0.352 1 42.188 42.539 42.891 42.363 0.176 0 42.188 42.363 42.539 42.275 0.088 0 42.188 42.275 42.363 42.319 0.044 1 42.275 42.319 42.363 42.341 0.022
  • 15. Geohash Grids Internal coordinates of an odd length geohash… DRT2Y …and an even length geohash DRT2
  • 16. Proximity with Geohashes • Points with a long common prefix are near each other – But the converse is not always true! • i.e.: Points near each other share a long common prefix T • Edges cases exist at every level: – D R T…. – D R M… – (only share 2 letters) M – However geohashes form a hierarchical grid
  • 17. Filtering by Lat-Lon Box User Query Geohash Resolution: 5 17
  • 18. Indexing Strategy Overview • Index geohashes • Develop query support to specify lat-lon box • Choose a good geohash resolution for the box • Lookup set of overlapping geohashes • Develop a Lucene Filter: – Seek to each overlapping geohash in termsEnum – Skip term (geohash) if not in desired box – Accumulate matching documents
  • 19. Indexing: Index Geohashes • Provided by SOLR-1586 • solr.GeoField – Input/output is latitude-comma-longitude, indexed internally as a geohash • See also: GeoHashUtils.java – Only bidirectional lat-lon conversion routines – Needed code to navigate to adjacent geohashes. (35 L.O.C.)
  • 20. Indexing: Query Support • Modified SOLR-1586: {!sfilt fl=body ll=42.358,-71.059 ur=43.000,-70.000} (~15 L.O.C.)
  • 21. Indexing: Determine Resolution double[] lookupDegreesSizeForHashLen(int hashLen) (14 L.O.C.) int estimateGoodPrefixLen(lat or lon, degrees) (9 L.O.C.)
  • 22. Indexing: Lookup Overlapping Geohashes String[] calcPrefixHashesToFilter(double north, south, east, west) (45 L.O.C.)
  • 23. Indexing: Develop Lucene Filter • Implement Lucene Filter getDocIdSet(): – Call calcGeohashPrefixesToFilter() – Seek to each overlapping geohash in termsEnum – Skip term (geohash) if not in desired box – Accumulate matching documents (45 L.O.C.) • Alter solr.GeoHashField createSpatialQuery() to use the filter (5 L.O.C.)
  • 24. Future Improvements • Lookup table based geohash adjacency • Avoid isInBox() calculation for geohashes known to be completely within the query • Index multi-precision geohashes – DRT2Y: DRT2Y, DRT2, DRT, DR, D – Reduces / avoids term enumerating, isInBox() • Use different geohash-like encoding – See javageomodel project
  • 25. References Pre-Existing Solr Spatial Search • SOLR-773, SOLR-1568 • JTeam Spatial Solr Plugin http://www.jteam.nl/news/spatialsolr.html • Brad Giaccio, of ManTech https://issues.apache.org/jira/browse/SOLR-773 (solrGeoQuery.tar) • MetaCarta GeoSearch Toolkit for Solr http://berlinbuzzwords.wdfiles.com/local--files/links-to-slides/goodwin_bbuzz2010.pdf
  • 26. Know more about Spacial search at www.lucidimagination.com