This document describes a new approach to geospatial search with polygons in Solr. The approach indexes points and polygons using geohash codes represented as 64-bit integers in a trie structure, allowing for polygon searches. It supports points, polygons, latitude-longitude boxes and point radii geometries. The capabilities are accessed through custom FieldTypes and a QParserPlugin. While promising for replacing their expensive database, it has not been released publicly yet due to intellectual property issues.
2. What is the “Burning Platform”?
§ Need to break dependency on expensive licenses for
proprietary database
– Major cost driver
– Unsustainable in current economic environment
§ Solr identified as promising replacement candidate
– Excellent cost
– Excellent performance
– Excellent access to source code
– Major weakness in required Geospatial Search capability
– Is Geospatial Search weakness mitigation possible?
§ Must index points for search by polygons
§ Should index polygons for search by polygons
Solr promising, Polygon Geospatial Search needed
5/16/12 2
3. What Has Been Produced?
§ A single add-in JAR file plus Schema enhancements
– Older variant requires a GPL library for point-in-polygon support
– Newer variant requires no external libraries
§ Internals use inherent three-dimensional mathematics
– LUCENE-3795/“Lucene Spatial Playground” geospatial search capability uses
JTS library for polygon support
– JTS uses two-dimensional mathematics
– JTS has greater vulnerability to special points
§ North and South Poles
§ 180° meridian
§ Potential problem for customer applications
– JTS supports complex polygons
§ Alternative approach only supports simple polygons at this time
Single JAR file using 3-D internal mathematics
5/16/12 3
4. What is the Magic?
§ Variant geohash coding
– 64-bit long integers instead of strings
– Three most significant bits for octants of Earth’s surface
§ Dividing at equator, prime meridian, 90° E/W meridians, 180° meridian
– Followed by three-bit groups
§ One stop/continue bit
§ One north/south split bit
§ One east/west split bit
– Allows precision down to 10 cm × 10 cm squares at equator
– Produces various-size “tiles” representing parts of Earth’s surface
§ Points indexed by the smallest tile which contains the point
E/W N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W
Indexing using 64-bit integers for trie-driven search
5/16/12 4
5. What About Polygons?
§ Polygons indexed as collection of tiles inside polygon
– Larger tiles completely contained in indexed polygon are not subdivided
– Smallest indexed tiles may extend outside indexed polygon
Polygons indexed with series of hash codes
5/16/12 5
6. What About Polygon Search?
§ Search polygon converted to tiles using indexing conversion
process
– Possible to get too many tile indices to search
§ Risks Lucene complaints about too many of BooleanClauses
§ Consolidate adjacent indices into ranges
§ Reduce tiling precision
– Reduce number of ranges
– Produce acceptable number of BooleanClauses
§ Results filtered by original search polygon
– Requires storage of original geometry data in addition to index
– No filter query required
§ Index always accessed with NumericRangeQuery
§ Insert custom logic wrapping NumericRangeQuery
Search similar to indexing with additional filtering
5/16/12 6
7. How Is This Capability Used?
§ Indexing accessed using custom FieldTypes in schema
– Specific types for each supported geometry type
– A general type to allow polymorphic geometry types
§ Trade-off is greater application coupling
– Specific type classes transform inputs and hand-off to general type class
– Indexing writes out two fields
§ Geospatial tile index
§ Original geometry storage
§ Search accessed using custom QParserPlugin
– Detects special suffixes on search field name to determine geometry type
– Converts input to geospatial tile index collection
– Builds Lucene query structure including custom and standard classes
New schema FieldTypes and new QParserPlugin
5/16/12 7
8. What Geometries Are Supported?
§ Points
– Specified by latitude and longitude
§ Polygons
– Specified by latitude-longitude pairs
§ Latitude-Longitude Boxes
– Specified by two latitude-longitude pairs specifying opposite corners
– Internally converted to polygons
§ Point-Radii
– Specified by latitude and longitude of center plus radius in meters, kilometers,
statute miles, or nautical miles
– Assumes spherical Earth NOT WGS-84 ellipsoid
§ Errors accepted for search
– Internally converted to approximating polygons
Latitude-Longitude Boxes and Point-Radii supported
5/16/12 8
9. How Can the Public Get This?
§ Currently working Intellectual Property issues
– Employer required provisional patent application submission before Lucene
Revolution abstract could be submitted
§ Could protect public use of license assuming public release
– Customer has Unrestricted Rights
§ Customer can release to public open source community
§ Customer may release to public open source community
– Customer dislikes proprietary solutions
§ Also need to work packaging issues such as a name
Not yet available to public, but that may change
5/16/12 9
10. Summary
§ Solr is excellent choice for our replacement of expensive
database
§ Geospatial Search with Polygons in Solr is possible and
implemented
– Can be used with or without LUCENE-3795/Lucene Spatial Playground”
approach
– Inherent 3-dimensional mathematics not found in LUCENE-3795 polygon
support
– Stores and uses both indices and original geometries
– No support for complex polygons at this time
§ Capabilities accessed with new FieldTypes and a new
QParserPlugin
§ Not yet released to public
5/16/12 10