In this set of slides we give a step by step tutorial on how to develop a fully functional solr search component plugin. Additionally we provide links to full source code which can be used as a template to rapidly start creating your own search components.
2. Solr is
◦ Blazing fast open source enterprise search platform
◦ Lucene-based search server
◦ Written in Java
◦ Has REST-like HTTP/XML and JSON APIs
◦ Extensive plugin architecture
http://lucene.apache.org/solr/
3. Allows for the development of plugins which
provide advanced operations
Types of plugins:
◦ RequestHandlers
Uses url parameters and returns own response
◦ SearchComponents
Responses are embedded in other responses (such as
/select)
◦ ProcessFactory
Response is stored into a field along with the
document during index time
4. A quick tutorial on how to program a
SearchComponent to
◦ Be initialized
◦ Parse configuration file arguments
◦ Do something useful on search request (counts
some words in indexed documents)
◦ Format and return response
We’ll name our plugin
“DemoSearchComponent” and show how to
stick it into the solrconfig.xml for loading
5. In the next slide, we’ll specify a list of variables
called “words”, and each list subtype is a string
“word”
We want to load these specific words and then
count them in all result sets of queries.
Ex: config file has “body”, “fish”, “dog”
◦ Indexed Document has: dog body body body fish fish
fish fish orange
◦ Result should be:
body=3.0
fish=4.0
dog=1.0
7. We can see that we’re asking for Solr to load
com.searchbox.DemoSearchComponent.
This will be the output of our project in .jar
file format
Copy the .jar file to the lib directory in the
Solr installation so that Solr can find it.
That’s it!
8. package com.searchbox;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.core.SolrCore;
import org.apache.solr.core.SolrEventListener;
import org.apache.solr.handler.component.ResponseBuilder;
import org.apache.solr.handler.component.SearchComponent;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.plugin.SolrCoreAware;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Just some of the
common packages we’ll
need to import to get
things rolling!
9. public class DemoSearchComponent
extends SearchComponent {
private static Logger LOGGER =
LoggerFactory.getLogger(DemoSearchC
omponent.class);
volatile long numRequests;
volatile long numErrors;
volatile long totalRequestsTime;
volatile String lastnewSearcher;
volatile String lastOptimizeEvent;
protected String defaultField;
private List<String> words;
• We specify that our class
extends SearchComponent, so
we know we’re in business!
• We decide that we’ll keep track
of some basic statistics for
future usage
• Number of requests/errors
• Total time
• Make a variable to store our
defaultField and our words.
10. Initialization is called when the plugin is first
loaded
This most commonly occurs when Solr is
started up
At this point we can load things from file
(models, serialized objects, etc)
Have access to the variables set in
solrconfig.xml
11. We have selected to pass a list called “words”
and have also provided the list “fish”, ”body”,
”cat” of words we’d like to count.
During initialization we need to load this list
from solrconfig.xml and store it locally
12. @Override
public void init(NamedList args) {
super.init(args);
defaultField = (String) args.get("field");
if (defaultField == null) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");
}
words = ((NamedList) args.get("words")).getAll("word");
if (words.isEmpty()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in
searchComponent config!");
}
}
Notice that we’ve loaded the list “words” and
then all of its attributes called “word” and put
them into the class level variable words.
Also we’ve identified our
defaultField
13. There are 2 phases in a searchComponent
◦ Prepare
◦ Process
During a query the prepare method is called
on all components before any work is done.
This allows modifying, adding or substracting
variables or components in the stack
Afterwards, the process methods are called
for the components in the exact order
specified by the solrconfig
14. @Override
public void prepare(ResponseBuilder rb)
throws IOException {
//none necessary
}
Nothing going on here, but we
need to override it otherwise
we can’t extend
SearchComponent
15. @Override
public void process(ResponseBuilder rb) throws IOException {
numRequests++;
SolrParams params = rb.req.getParams();
long lstartTime = System.currentTimeMillis();
SolrIndexSearcher searcher = rb.req.getSearcher();
NamedList response = new SimpleOrderedMap();
String queryField = params.get("field");
String field = null;
if (defaultField != null) {
field = defaultField;
}
if (queryField != null) {
field = queryField;
}
if (field == null) {
LOGGER.error("Fields aren't defined, not performing counting.");
return;
}
• We start off by keeping track in a volatile
variable the number of requests we’ve
seen (for use later in statistics), and we’d
like to know how long the process takes
so we note the time.
• We create a new NamedList which will
hold this components response
• We look at the URL parameters to see if
there is a “field” variable present. We
have set this up to override the default
we loaded from the config file
16. DocList docs = rb.getResults().docList;
if (docs == null || docs.size() == 0) {
LOGGER.debug("No results");
}
LOGGER.debug("Doing This many docs:t" + docs.size());
Set<String> fieldSet = new HashSet<String>();
SchemaField keyField =
rb.req.getCore().getSchema().getUniqueKeyField();
if (null != keyField) {
fieldSet.add(keyField.getName());
}
fieldSet.add(field);
• Since the search has
already been completed,
we get a list of documents
which will be returned.
• We also need to pull from
the schema the field which
contains the unique id.
This will let us correlate
our results with the rest of
the response
17. DocIterator iterator = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
try {
int docId = iterator.nextDoc();
HashMap<String, Double> counts = new HashMap<String, Double>();
Document doc = searcher.doc(docId, fieldSet);
IndexableField[] multifield = doc.getFields(field);
for (IndexableField singlefield : multifield) {
for (String string : singlefield.stringValue().split(" ")) {
if (words.contains(string)) {
Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;
counts.put(string, oldcount + 1);
}
}
}
String id = doc.getField(keyField.getName()).stringValue();
NamedList<Double> docresults = new NamedList<Double>();
for (String word : words) {
docresults.add(word, counts.get(word));
}
response.add(id, docresults);
} catch (IOException ex) {
java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);
}
}
• Get a document iterator to look
through all docs
• Setup count variable this doc
• Load the document through the
searcher
• Get the value of the field
• BEWARE if it is a multifield, using
getField will only return the first
instance, not ALL instances
• Do our basic word counting
• Get the document unique id from
the keyfield
• Add each word to the results for
the doc
• Add the doc result to the overall
response, using its id value
19. @Override
public String getDescription() {
return "Searchbox DemoSearchComponent";
}
@Override
public String getVersion() {
return "1.0";
}
@Override
public String getSource() {
return "http://www.searchbox.com";
}
@Override
public NamedList<Object> getStatistics() {
NamedList all = new SimpleOrderedMap<Object>();
all.add("requests", "" + numRequests);
all.add("errors", "" + numErrors);
all.add("totalTime(ms)", "" + totalTime);
return all;
}
• In order to have a production
grade plugin, users expect to see
certain pieces of information
available in their Solr admin
panel
• Description, version and source
are just Strings
• We see getStatistics() actually
uses the volatile variables we
were keeping track of before,
sticks them into another named
list and returns them. These
appear under the statistics panel
in Solr.
That’s it!
20. <requestHandler name="/demoendpoint" class="solr.SearchHandler">
<arr name="last-components">
<str>democomponent</str>
</arr>
</requestHandler>
We need some way to run our searchComponent, so we’ll add a quick
requestHandler to test it. This is done simply by overriding the normal
searchHandler and telling it to run the component we defined on an earlier
slide. Of course you could use your component directly in the select handler
and/or add it to a chain of other components! Solr is super versatile!
21. http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">79</int>
</lst>
<result name="response" numFound="13262" start="0">
<doc>
<str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str>
<arr name="myfield">
<str>dog body body body fish fish fish fish orange</str>
</arr>
</doc>
<doc>
<str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str>
<arr name="myfield">
<str>the fish had a small body. the dog likes to eat fish</str>
</arr>
</doc>
</result>
<lst name="demoSearchComponent">
<lst name="f73ca075-3826-45d5-85df-64b33c760efc">
<double name="body">3.0</double>
<double name="fish">4.0</double>
<double name="dog">1.0</double>
</lst>
<lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05">
<double name="body">1.0</double>
<double name="fish">2.0</double>
<double name="dog">1.0</double>
</lst>
</lst>
</response>
Query results
Our results
Same order + ids
for correlation
22. • Because we’ve overridden the getStatistics() method, we
can get real-time stats from the admin panel!
• In this case since it’s a component of the SearchHandler,
our fields are concatenated with the other statistics