SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Search with Polygons


Another Approach to Solr Geospatial Search
Dr. Andrew L. Urquhart

May 10, 2012

                                           Copyright © 2012 Raytheon Company. All rights reserved.
                     Customer Success Is Our Mission is a registered trademark of Raytheon Company.
What is the “Burning Platform”?
§  Need to break dependency on expensive licenses for
    proprietary database
 –  Major cost driver
 –  Unsustainable in current economic environment
§  Solr identified as promising replacement candidate
 –    Excellent cost
 –    Excellent performance
 –    Excellent access to source code
 –    Major weakness in required Geospatial Search capability
 –    Is Geospatial Search weakness mitigation possible?
      §  Must index points for search by polygons
      §  Should index polygons for search by polygons




            Solr promising, Polygon Geospatial Search needed
                                                                5/16/12   2
What Has Been Produced?
§  A single add-in JAR file plus Schema enhancements
 –  Older variant requires a GPL library for point-in-polygon support
 –  Newer variant requires no external libraries
§  Internals use inherent three-dimensional mathematics
 –  LUCENE-3795/“Lucene Spatial Playground” geospatial search capability uses
    JTS library for polygon support
 –  JTS uses two-dimensional mathematics
 –  JTS has greater vulnerability to special points
    §  North and South Poles
    §  180° meridian
    §  Potential problem for customer applications
 –  JTS supports complex polygons
    §  Alternative approach only supports simple polygons at this time



             Single JAR file using 3-D internal mathematics
                                                                        5/16/12   3
What is the Magic?
§  Variant geohash coding
 –  64-bit long integers instead of strings
 –  Three most significant bits for octants of Earth’s surface
    §  Dividing at equator, prime meridian, 90° E/W meridians, 180° meridian
 –  Followed by three-bit groups
    §  One stop/continue bit
    §  One north/south split bit
    §  One east/west split bit
 –  Allows precision down to 10 cm × 10 cm squares at equator
 –  Produces various-size “tiles” representing parts of Earth’s surface
§  Points indexed by the smallest tile which contains the point
   E/W N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W



          Indexing using 64-bit integers for trie-driven search
                                                                            5/16/12   4
What About Polygons?
§  Polygons indexed as collection of tiles inside polygon
 –  Larger tiles completely contained in indexed polygon are not subdivided
 –  Smallest indexed tiles may extend outside indexed polygon




              Polygons indexed with series of hash codes
                                                                              5/16/12   5
What About Polygon Search?
§  Search polygon converted to tiles using indexing conversion
    process
 –  Possible to get too many tile indices to search
    §  Risks Lucene complaints about too many of BooleanClauses
    §  Consolidate adjacent indices into ranges
    §  Reduce tiling precision
        –  Reduce number of ranges
        –  Produce acceptable number of BooleanClauses
§  Results filtered by original search polygon
 –  Requires storage of original geometry data in addition to index
 –  No filter query required
    §  Index always accessed with NumericRangeQuery
    §  Insert custom logic wrapping NumericRangeQuery




           Search similar to indexing with additional filtering
                                                                      5/16/12   6
How Is This Capability Used?
§  Indexing accessed using custom FieldTypes in schema
 –  Specific types for each supported geometry type
 –  A general type to allow polymorphic geometry types
    §  Trade-off is greater application coupling
 –  Specific type classes transform inputs and hand-off to general type class
 –  Indexing writes out two fields
    §  Geospatial tile index
    §  Original geometry storage
§  Search accessed using custom QParserPlugin
 –  Detects special suffixes on search field name to determine geometry type
 –  Converts input to geospatial tile index collection
 –  Builds Lucene query structure including custom and standard classes




           New schema FieldTypes and new QParserPlugin
                                                                            5/16/12   7
What Geometries Are Supported?
§  Points
  –  Specified by latitude and longitude
§  Polygons
  –  Specified by latitude-longitude pairs
§  Latitude-Longitude Boxes
  –  Specified by two latitude-longitude pairs specifying opposite corners
  –  Internally converted to polygons
§  Point-Radii
  –  Specified by latitude and longitude of center plus radius in meters, kilometers,
     statute miles, or nautical miles
  –  Assumes spherical Earth NOT WGS-84 ellipsoid
     §  Errors accepted for search
  –  Internally converted to approximating polygons



          Latitude-Longitude Boxes and Point-Radii supported
                                                                               5/16/12   8
How Can the Public Get This?
§  Currently working Intellectual Property issues
 –  Employer required provisional patent application submission before Lucene
    Revolution abstract could be submitted
    §  Could protect public use of license assuming public release
 –  Customer has Unrestricted Rights
    §  Customer can release to public open source community
    §  Customer may release to public open source community
        –  Customer dislikes proprietary solutions
§  Also need to work packaging issues such as a name




           Not yet available to public, but that may change
                                                                          5/16/12   9
Summary
§  Solr is excellent choice for our replacement of expensive
    database
§  Geospatial Search with Polygons in Solr is possible and
    implemented
 –  Can be used with or without LUCENE-3795/Lucene Spatial Playground”
    approach
 –  Inherent 3-dimensional mathematics not found in LUCENE-3795 polygon
    support
 –  Stores and uses both indices and original geometries
 –  No support for complex polygons at this time
§  Capabilities accessed with new FieldTypes and a new
    QParserPlugin
§  Not yet released to public



                                                                      5/16/12   10

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (9)

Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Kernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionKernel Descriptors for Visual Recognition
Kernel Descriptors for Visual Recognition
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web Applications
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
QGIS training class 2
QGIS training class 2QGIS training class 2
QGIS training class 2
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 

Ähnlich wie Search with Polygons: Another Approach to Solr Geospatial Search

A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph Analytics
Donald Nguyen
 
Barcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSourceBarcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSource
Petr Pridal
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
Lucidworks (Archived)
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 

Ähnlich wie Search with Polygons: Another Approach to Solr Geospatial Search (20)

DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation
 
Phd defense slides
Phd defense slidesPhd defense slides
Phd defense slides
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph Analytics
 
Barcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSourceBarcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSource
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkGeospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Efficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesEfficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search Engines
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 

Mehr von lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Search with Polygons: Another Approach to Solr Geospatial Search

  • 1. Search with Polygons Another Approach to Solr Geospatial Search Dr. Andrew L. Urquhart May 10, 2012 Copyright © 2012 Raytheon Company. All rights reserved. Customer Success Is Our Mission is a registered trademark of Raytheon Company.
  • 2. What is the “Burning Platform”? §  Need to break dependency on expensive licenses for proprietary database –  Major cost driver –  Unsustainable in current economic environment §  Solr identified as promising replacement candidate –  Excellent cost –  Excellent performance –  Excellent access to source code –  Major weakness in required Geospatial Search capability –  Is Geospatial Search weakness mitigation possible? §  Must index points for search by polygons §  Should index polygons for search by polygons Solr promising, Polygon Geospatial Search needed 5/16/12 2
  • 3. What Has Been Produced? §  A single add-in JAR file plus Schema enhancements –  Older variant requires a GPL library for point-in-polygon support –  Newer variant requires no external libraries §  Internals use inherent three-dimensional mathematics –  LUCENE-3795/“Lucene Spatial Playground” geospatial search capability uses JTS library for polygon support –  JTS uses two-dimensional mathematics –  JTS has greater vulnerability to special points §  North and South Poles §  180° meridian §  Potential problem for customer applications –  JTS supports complex polygons §  Alternative approach only supports simple polygons at this time Single JAR file using 3-D internal mathematics 5/16/12 3
  • 4. What is the Magic? §  Variant geohash coding –  64-bit long integers instead of strings –  Three most significant bits for octants of Earth’s surface §  Dividing at equator, prime meridian, 90° E/W meridians, 180° meridian –  Followed by three-bit groups §  One stop/continue bit §  One north/south split bit §  One east/west split bit –  Allows precision down to 10 cm × 10 cm squares at equator –  Produces various-size “tiles” representing parts of Earth’s surface §  Points indexed by the smallest tile which contains the point E/W N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W Indexing using 64-bit integers for trie-driven search 5/16/12 4
  • 5. What About Polygons? §  Polygons indexed as collection of tiles inside polygon –  Larger tiles completely contained in indexed polygon are not subdivided –  Smallest indexed tiles may extend outside indexed polygon Polygons indexed with series of hash codes 5/16/12 5
  • 6. What About Polygon Search? §  Search polygon converted to tiles using indexing conversion process –  Possible to get too many tile indices to search §  Risks Lucene complaints about too many of BooleanClauses §  Consolidate adjacent indices into ranges §  Reduce tiling precision –  Reduce number of ranges –  Produce acceptable number of BooleanClauses §  Results filtered by original search polygon –  Requires storage of original geometry data in addition to index –  No filter query required §  Index always accessed with NumericRangeQuery §  Insert custom logic wrapping NumericRangeQuery Search similar to indexing with additional filtering 5/16/12 6
  • 7. How Is This Capability Used? §  Indexing accessed using custom FieldTypes in schema –  Specific types for each supported geometry type –  A general type to allow polymorphic geometry types §  Trade-off is greater application coupling –  Specific type classes transform inputs and hand-off to general type class –  Indexing writes out two fields §  Geospatial tile index §  Original geometry storage §  Search accessed using custom QParserPlugin –  Detects special suffixes on search field name to determine geometry type –  Converts input to geospatial tile index collection –  Builds Lucene query structure including custom and standard classes New schema FieldTypes and new QParserPlugin 5/16/12 7
  • 8. What Geometries Are Supported? §  Points –  Specified by latitude and longitude §  Polygons –  Specified by latitude-longitude pairs §  Latitude-Longitude Boxes –  Specified by two latitude-longitude pairs specifying opposite corners –  Internally converted to polygons §  Point-Radii –  Specified by latitude and longitude of center plus radius in meters, kilometers, statute miles, or nautical miles –  Assumes spherical Earth NOT WGS-84 ellipsoid §  Errors accepted for search –  Internally converted to approximating polygons Latitude-Longitude Boxes and Point-Radii supported 5/16/12 8
  • 9. How Can the Public Get This? §  Currently working Intellectual Property issues –  Employer required provisional patent application submission before Lucene Revolution abstract could be submitted §  Could protect public use of license assuming public release –  Customer has Unrestricted Rights §  Customer can release to public open source community §  Customer may release to public open source community –  Customer dislikes proprietary solutions §  Also need to work packaging issues such as a name Not yet available to public, but that may change 5/16/12 9
  • 10. Summary §  Solr is excellent choice for our replacement of expensive database §  Geospatial Search with Polygons in Solr is possible and implemented –  Can be used with or without LUCENE-3795/Lucene Spatial Playground” approach –  Inherent 3-dimensional mathematics not found in LUCENE-3795 polygon support –  Stores and uses both indices and original geometries –  No support for complex polygons at this time §  Capabilities accessed with new FieldTypes and a new QParserPlugin §  Not yet released to public 5/16/12 10