5. “ swans are crowding ducks out of the local lake” A basic search engine should respond to queries as follows: “ swan lake” - returns both documents (inexact matching) “ swan dive” - returns both documents (boolean OR matching) “ swan lake duck” - returns document 2 first (ranking) “ crowds” - returns document 2 (stemming) “ of the” - returns neither (stopword removal) And it should do all of this quickly
13. The Contenders Stemming & Stopword Removal Boolean OR Ranking Datastore Query SearchableModel stopword removal only Bill Katz' Searchable x gae-search x BigTable Search x x x
14. What The Others Are Missing Boolean OR/Ranking – Makes multi-term queries almost pointless Faceted Search – Users are accustomed to this from sites like Amazon Scalability – No one uses inverted indexes!
24. “ it is a banana” To add the first document, we have to update 4 index entries. The bigger the documents get, the worse it gets. Worse, multiple documents are represented in a single index entry, so concurrency becomes a problem too – try locking on the index entry for “the”, and your entire system becomes effectively single-threaded!
25. The Solution to Updating Asynchronous Updates Data Store doc calc Δ queue merge Δ queue merge Δ queue merge Δ queue
27. The Better Answer? BigTable Search suffers from some significant limitations: - Fast search engines use custom file storage formats for performance, BigTable Search does not have this option and is consequently not fast - No phrase matching - No synonym or semantic matching Google is working on a full-text search solution (feature 217 on Issues List, In Progress, no ETA, session scheduled for Google I/O in May)
28. Resources pyporter2 (used by BigTable Search and others for stemming) http://github.com/mdirolf/pyporter2 SearchableModel http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/__init__.py Bill Katz' Simple Full-text Search for App Engine http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine gae-search http://gae-full-text-search.appspot.com/ BigTable Search http://code.google.com/p/bigtablesearch/ Google's Upcoming Full-text Search (feature 217) http://code.google.com/p/googleappengine/issues/detail?id=217