Solr is a powerful open source search engine server which has become a popular choice for extending the search capabilities of Plone sites. The default configuration works well, but how do you answer the client's request to "Make my search just like Google's"?
In this talk we will take a look at the various options that are available for configuring Solr's schema and configuration. We will discuss how to set up stop words, spell checking, n-grams and alternate query handlers. We will see what effect these settings will have on the search results and find out how to debug problems when they arise.
16. Query Handlers PLONE CONFERENCE 2011
• Standard
• Disjunction Max (DisMax)
• Extended DisMax (experimental)
17. DisMax PLONE CONFERENCE 2011
• Multiple index searches
• Boosting
• Friendlier to end users
18. DisMax PLONE CONFERENCE 2011
Index Name
qf=SearchableText^1.0 substring^0.2
Weight
19. MinShouldMatch PLONE CONFERENCE 2011
mm=100%
All terms required
mm=50%
Half of the terms required
mm=-2
All but two terms required
20. MinShouldMatch PLONE CONFERENCE 2011
mm=2<-25% 9<-3
2 or less 3-9 terms all more than 9
terms are but 25% terms all but
required required three are
required
21. Spelling Component PLONE CONFERENCE 2011
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="buildOnCommit">true</str>
<str name="spellcheckIndexDir">path/to/spellcheck</str>
<!-- The field that will contain the dynamic spelling data -->
<str name="field">spell</str>
<str name="accuracy">0.5</str>
</lst>
<!-- Control indexing and query of spelling data -->
<str name="queryAnalyzerFieldType">spell-text</str>
</searchComponent>
33. Pattern Replace PLONE CONFERENCE 2011
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/>
'That WAS a narrow escape!' said Alice, a good deal frightened
That WAS a narrow escape said Alice a good deal frightened
37. Whitespace Tokenizer PLONE CONFERENCE 2011
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
'That WAS a narrow escape!' said Alice
'That
WAS
a
narrow
escape!'
said
Alice
38. ICU Tokenizer PLONE CONFERENCE 2011
<tokenizer class="solr.ICUTokenizerFactory"/>
'That WAS a narrow escape!' said Alice
That
WAS
a
narrow
escape
said
Alice
39. Pattern Tokenizer PLONE CONFERENCE 2011
<tokenizer class="solr.PatternTokenizerFactory" pattern=";s*" />
one; two; three
one
two
three
42. Lower Case PLONE CONFERENCE 2011
<filter class="solr.LowerCaseFilterFactory"/>
Foo
bAr
BAZ
foo
bar
baz
43. ASCII Folding PLONE CONFERENCE 2011
<filter class="solr.ASCIIFoldingFilterFactory"/>
idée
bête
grüßen
idee
bete
grussen
44. ICU Folding PLONE CONFERENCE 2011
<filter class="solr.ICUFoldingFilterFactory"/>
Idée
BÊTE
GrüßeN
idee
bete
grussen
45. Pattern Replace PLONE CONFERENCE 2011
<filter class="solr.PatternReplaceFilterFactory"
pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/>
'That That
WAS WAS
a a
narrow narrow
escape!' escape
said said
Alice Alice
48. Stop Words PLONE CONFERENCE 2011
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
That narrow
WAS escape
a said
narrow Alice
escape good
said deal
Alice frightened
a
good
deal
frightened
49. Synonyms PLONE CONFERENCE 2011
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
# synonyms.txt foozball
foosball foosball
# add multiple terms baby-foot
foozball, foosball, baby-foot
tele television
# merge into one t.v. television
tv, t.v., tele => television tv television
51. Language Stemming PLONE CONFERENCE 2011
<filter class="solr.ElisionFilterFactory" articles="stopwordarticles.txt"/>
qu'il il
ne ne
comprend comprend
pas pas
l'anglais anglais
<filter class="solr.EnglishPorterFilterFactory" language="French"/>
considere consider
consideres consider
considerent consider