Tutorial on developing a Solr search component plugin

andrew.janowczyk@searchbox.com

Solr is
◦ Blazing fast open source enterprise search platform
◦ Lucene-based search server
◦ Written in Java
◦ Has REST-like HTTP/XML and JSON APIs
◦ Extensive plugin architecture
http://lucene.apache.org/solr/

 Allows for the development of plugins which
provide advanced operations
 Types of plugins:
◦ RequestHandlers
 Uses url parameters and returns own response
◦ SearchComponents
 Responses are embedded in other responses (such as
/select)
◦ ProcessFactory
 Response is stored into a field along with the
document during index time

 A quick tutorial on how to program a
SearchComponent to
◦ Be initialized
◦ Parse configuration file arguments
◦ Do something useful on search request (counts
some words in indexed documents)
◦ Format and return response
 We’ll name our plugin
“DemoSearchComponent” and show how to
stick it into the solrconfig.xml for loading

 In the next slide, we’ll specify a list of variables
called “words”, and each list subtype is a string
“word”
 We want to load these specific words and then
count them in all result sets of queries.
 Ex: config file has “body”, “fish”, “dog”
◦ Indexed Document has: dog body body body fish fish
fish fish orange
◦ Result should be:
 body=3.0
 fish=4.0
 dog=1.0

<searchComponent
class="com.searchbox.DemoSearchComponent"
name="democomponent">
<str name=“field">myfield</str>
<lst name="words">
<str name="word">body</str>
<str name="word">fish</str>
<str name="word">dog</str>
</lst>
</searchComponent>
• We tell Solr the name of the
class which has our
component
• Variables will be loaded
from this section during
the init method
• We set a default field for
analyzing the documents
• We specify a list of words
we’d like to have counts of

 We can see that we’re asking for Solr to load
com.searchbox.DemoSearchComponent.
 This will be the output of our project in .jar
file format
 Copy the .jar file to the lib directory in the
Solr installation so that Solr can find it.
 That’s it!

package com.searchbox;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.core.SolrCore;
import org.apache.solr.core.SolrEventListener;
import org.apache.solr.handler.component.ResponseBuilder;
import org.apache.solr.handler.component.SearchComponent;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.plugin.SolrCoreAware;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Just some of the
common packages we’ll
need to import to get
things rolling!

public class DemoSearchComponent
extends SearchComponent {
private static Logger LOGGER =
LoggerFactory.getLogger(DemoSearchC
omponent.class);
volatile long numRequests;
volatile long numErrors;
volatile long totalRequestsTime;
volatile String lastnewSearcher;
volatile String lastOptimizeEvent;
protected String defaultField;
private List<String> words;
• We specify that our class
extends SearchComponent, so
we know we’re in business!
• We decide that we’ll keep track
of some basic statistics for
future usage
• Number of requests/errors
• Total time
• Make a variable to store our
defaultField and our words.

 Initialization is called when the plugin is first
loaded
 This most commonly occurs when Solr is
started up
 At this point we can load things from file
(models, serialized objects, etc)
 Have access to the variables set in
solrconfig.xml

 We have selected to pass a list called “words”
and have also provided the list “fish”, ”body”,
”cat” of words we’d like to count.
 During initialization we need to load this list
from solrconfig.xml and store it locally

@Override
public void init(NamedList args) {
super.init(args);
defaultField = (String) args.get("field");
if (defaultField == null) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");
}
words = ((NamedList) args.get("words")).getAll("word");
if (words.isEmpty()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in
searchComponent config!");
}
}
Notice that we’ve loaded the list “words” and
then all of its attributes called “word” and put
them into the class level variable words.
Also we’ve identified our
defaultField

 There are 2 phases in a searchComponent
◦ Prepare
◦ Process
 During a query the prepare method is called
on all components before any work is done.
 This allows modifying, adding or substracting
variables or components in the stack
 Afterwards, the process methods are called
for the components in the exact order
specified by the solrconfig

@Override
public void prepare(ResponseBuilder rb)
throws IOException {
//none necessary
}
Nothing going on here, but we
need to override it otherwise
we can’t extend
SearchComponent

@Override
public void process(ResponseBuilder rb) throws IOException {
numRequests++;
SolrParams params = rb.req.getParams();
long lstartTime = System.currentTimeMillis();
SolrIndexSearcher searcher = rb.req.getSearcher();
NamedList response = new SimpleOrderedMap();
String queryField = params.get("field");
String field = null;
if (defaultField != null) {
field = defaultField;
}
if (queryField != null) {
field = queryField;
}
if (field == null) {
LOGGER.error("Fields aren't defined, not performing counting.");
return;
}
• We start off by keeping track in a volatile
variable the number of requests we’ve
seen (for use later in statistics), and we’d
like to know how long the process takes
so we note the time.
• We create a new NamedList which will
hold this components response
• We look at the URL parameters to see if
there is a “field” variable present. We
have set this up to override the default
we loaded from the config file

DocList docs = rb.getResults().docList;
if (docs == null || docs.size() == 0) {
LOGGER.debug("No results");
}
LOGGER.debug("Doing This many docs:t" + docs.size());
Set<String> fieldSet = new HashSet<String>();
SchemaField keyField =
rb.req.getCore().getSchema().getUniqueKeyField();
if (null != keyField) {
fieldSet.add(keyField.getName());
}
fieldSet.add(field);
• Since the search has
already been completed,
we get a list of documents
which will be returned.
• We also need to pull from
the schema the field which
contains the unique id.
This will let us correlate
our results with the rest of
the response

DocIterator iterator = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
try {
int docId = iterator.nextDoc();
HashMap<String, Double> counts = new HashMap<String, Double>();
Document doc = searcher.doc(docId, fieldSet);
IndexableField[] multifield = doc.getFields(field);
for (IndexableField singlefield : multifield) {
for (String string : singlefield.stringValue().split(" ")) {
if (words.contains(string)) {
Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;
counts.put(string, oldcount + 1);
}
}
}
String id = doc.getField(keyField.getName()).stringValue();
NamedList<Double> docresults = new NamedList<Double>();
for (String word : words) {
docresults.add(word, counts.get(word));
}
response.add(id, docresults);
} catch (IOException ex) {
java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);
}
}
• Get a document iterator to look
through all docs
• Setup count variable this doc
• Load the document through the
searcher
• Get the value of the field
• BEWARE if it is a multifield, using
getField will only return the first
instance, not ALL instances
• Do our basic word counting
• Get the document unique id from
the keyfield
• Add each word to the results for
the doc
• Add the doc result to the overall
response, using its id value

rb.rsp.add("demoSearchComponent", response);
totalRequestsTime += System.currentTimeMillis() - lstartTime;
}
• Add all results to the final
response
• The name we pick here will
show up in the Solr output
• Note down how long it took
for the entire process

@Override
public String getDescription() {
return "Searchbox DemoSearchComponent";
}
@Override
public String getVersion() {
return "1.0";
}
@Override
public String getSource() {
return "http://www.searchbox.com";
}
@Override
public NamedList<Object> getStatistics() {
NamedList all = new SimpleOrderedMap<Object>();
all.add("requests", "" + numRequests);
all.add("errors", "" + numErrors);
all.add("totalTime(ms)", "" + totalTime);
return all;
}
• In order to have a production
grade plugin, users expect to see
certain pieces of information
available in their Solr admin
panel
• Description, version and source
are just Strings
• We see getStatistics() actually
uses the volatile variables we
were keeping track of before,
sticks them into another named
list and returns them. These
appear under the statistics panel
in Solr.
That’s it!

<requestHandler name="/demoendpoint" class="solr.SearchHandler">
<arr name="last-components">
<str>democomponent</str>
</arr>
</requestHandler>
We need some way to run our searchComponent, so we’ll add a quick
requestHandler to test it. This is done simply by overriding the normal
searchHandler and telling it to run the component we defined on an earlier
slide. Of course you could use your component directly in the select handler
and/or add it to a chain of other components! Solr is super versatile!

http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">79</int>
</lst>
<result name="response" numFound="13262" start="0">
<doc>
<str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str>
<arr name="myfield">
<str>dog body body body fish fish fish fish orange</str>
</arr>
</doc>
<doc>
<str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str>
<arr name="myfield">
<str>the fish had a small body. the dog likes to eat fish</str>
</arr>
</doc>
</result>
<lst name="demoSearchComponent">
<lst name="f73ca075-3826-45d5-85df-64b33c760efc">
<double name="body">3.0</double>
<double name="fish">4.0</double>
<double name="dog">1.0</double>
</lst>
<lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05">
<double name="body">1.0</double>
<double name="fish">2.0</double>
<double name="dog">1.0</double>
</lst>
</lst>
</response>
Query results
Our results
Same order + ids
for correlation

• Because we’ve overridden the getStatistics() method, we
can get real-time stats from the admin panel!
• In this case since it’s a component of the SearchHandler,
our fields are concatenated with the other statistics

Happy Developing!
Full Source Code available at:
http://www.searchbox.com/developing-a-solr-plugin/

Tutorial on developing a Solr search component plugin

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tutorial on developing a Solr search component plugin

Similar to Tutorial on developing a Solr search component plugin (20)

Recently uploaded

Recently uploaded (20)

Tutorial on developing a Solr search component plugin