SOLR has been integrated with OpenCms 9.5 tighter than ever before. With 9.5, all content items in the OpenCms repository can be indexed by SOLR, in all available languages. This deep integration allows to use SOLR not only for basic full text searches, but also as an API extension to create advanced queries for all kinds of contents.
In this workshop, Sören shows how to use SOLR for advanced content retrieval in OpenCms. He combines attributes, properties and XML field values in a query that generates an editable list of elements with a content collector. He also explains how to use advanced features such as individual content field mappings to make your custom content types easily findable.
2. 1.Brief Introduction Into Solr
2.Common Mistakes Using OpenCms & Solr
3.Using the Solr Collector (DEMO)
4.Spellchecking in OpenCms Using Solr
Agenda
3. ●Solr is a very versatile and powerfool search engine that supports various features
●This functionality comes with the price of increased complexity to handle Solr
●Many customizations available
●All fields composing a single document are typed
Brief Solr Introduction
4. ●Data structures of Solr‘s documents are defined the file schema.xml
●Performing changes on this file requires reindexing
●Dynamic Fields cope with that limitiation
●Can be used without being explicitely defined in the schema using wildcards
Defining Solr‘s Data Structure
5. Solr: Indexing Content
a: date
b: text
c: string
Solr processing (through analyzers, filters and tokenizers)
a: date
b: string
c: string
6. ●„Direct“ usage of OpenCms & Solr requires a basic understanding of Solr
●Use proper datatypes in respect of individual usecase, gain knowledge of filters
●Know the query syntax (for appropriate datatypes)
●Most common mistakes of OpenCms users result in insufficient knowledge of Solr basics
OpenCms & Solr
7. 1.Using inproper types
●„text“ vs „string“
●Formulating correct queries
2.Issues regarding mapping OpenCms <->Solr
3.(Encoding Problems)
Common Mistakes Using Solr & OpenCms
8. ●String
●Stores its content as exact string
●No tokenization / processing is being performed
●Useful when searching for exact value
●Text
●Tokenization and processing is performed
●Useful when a part of the content is searched for
„text“ vs „string“
9. ●OpenCms‘s copies the entire XML content into a single(!) locale-aware Solr field of type „text“ for each locale
●Particular information of a resource is made searchable in OpenCms using two approaches
●Automatic mapping of properties to Solr fields
●Manual definintion of mappings
Making Your Content Searchable
10. Indexing Content w/o Searchsettings
Solr processing (through analyzers, filters and tokenizers)
x: text
a: date
b: string
c: string
11. Indexing Content with Searchsettings
a: date
b: text
c: string
Solr processing (through analyzers, filters and tokenizers)
a: date
b: string
c: string
12. ●Mapping happens in the scheme of the appropriate resource type
●Excerpt
Solr – OpenCms Interaction: Mapping
<xsd:schema
…
<xsd:annotation
<xsd:appinfo
<searchsettings>
<searchsetting element= "City" searchcontent="true">
<solrfield targetfield= "city" sourcefield="_s"
</searchsetting> …
Resource type element name
13. Element Mapping Attributes
Attribute Name
Effect on the Solr Field
targetfield*
The resulting name
locale
Write content only for specific locale
sourcefield
Defines the resulting type
copyfields
Copies the value to a different field
default
Sets a default value
boost
Sets a boost for the field
14. ●Users complain about problems regarding certain Characters – mostly German Umlauts – in Solr results
●In nearly all cases the sole problem lies within the integration of Solr to the servlet cotainer which is not happening in UTF-8
●Extra note for Tomcat users: Please check whether you appended the required attributes all appropriate „<Connector>“s ;-)
Using UTF-8 in Solr
17. ●The Spellchecker has been realized using Solr
●Solr already provides a flexible component named „SpellCheckComponent“
●This component supports inline spellchecking of Solr queries
●Source for suggestions can be specified by Solr fields or text files
WYSIWIG Spellchecker
18. ●The „SpellCheckComponent“ is widely used to implement the „Did you mean?“-feature known by popular search engines
●The component is
●Reliable and mature
●Fast
●Plus, Solr is already available in OpenCms
Why using Solr as Spellchecker
19. ●If both usecases use the same component, how do the implementations actually differ?
●„Did you mean?“ builds source of suggested words based on the entire data, the search runs on. Usually only a single hit is returned.
●The WYSIWYG spellchecker builds ist source of suggestions based on a data that solely contains the dictionary for a single language
Differences Between Usecases in Regards of Implementation
20. ●Spellchecking has been realized using another Solr core that resides in WEB-INF/spellcheck
●As the only purpose of this core is to contain spellcheck information, the schema.xml file is as simple as it gets
●Why using another Solr core instead of the default core that‘s used by OpenCms?
●Dictionaries are stored as one Solr index per language
How to model this scenario using Solr?
21. ●Sadly, the spellchecking interfaces of tinyMCE and Solr are incompatible
Problems regarding tinyMCE and Solr
Solr
tinyMCE
23. ●A new component had to be realized in OpenCms that basically
●Accepts spellcheck requests from tinyMCE
●Handles tinyMCE and Solr communication and message conversion
●Checks and (re-)builds spellcheck indices
●The appropriate code is found in org.opencms.search.solr.spellcheck
Glueing the Pieces together
24. ●Dictionaries can be edited easily in OpenCms
●Those indices are automatically filled by flat text files, one word per line
●Support for multiple languages
●To access the dicts, have a look at the directory org.opencms.workplace.spellcheck/resources/
Spellchecker in OpenCms
25. ●Adding a new language
1.Create new Solr field in schema.xml
2.Create new dictionary file inside VFS
3.Restart OpenCms
●Adding words to the custom dict
Extending the Spellchecker