Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16
"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."
2. Who am I?
My (Buddhist) name is Upayavira
Consultant with Sourcesense, specialising in
search and operational technologies
A member of the Apache Software Foundation
3. Who are Sourcesense?
Open Source integrator, specialising in:
Search
Business Intelligence
Content Management
Application Lifecycle Management
Offices in London, Amsterdam, Milan and Rome
5. What is Lucene?
Lucene is a Java information retrieval library
Provides free text search facilities
Started in 2000, by Doug Cutting
A project of the Apache Software Foundation
It is designed to be embedded in Java apps
6. What is Solr?
Solr is an enterprise search server based on
Lucene
Wraps Lucene with a RESTful web interface
Provides configurable schema
Provides replication functionality
7. Solr Design
User queries
Solr SearchHandler
instance
Lucene
index
UpdateRequestHandler
content
application
9. Prerequisites
Extract your Solr distribution
At a command prompt:
– cd into the unzipped distribution directory
– cd into the example directory
– Enter: java -jar start.jar
Visit http://localhost:8983/solr/ in a browser. If you see a
welcome message, your Solr works
Unpack your dev8d-solr.zip file
At another command prompt, cd into your dev8d-solr
directory
10. Checking Solr Works
Visit http://localhost:8983/solr/admin/
You should see the Solr admin page.
Click statistics link
You'll see NumDocs: 0
There's nothing in the index, so searches won't show
much
So we need to index some sample content
11. Indexing Sample Content
In your dev8d-solr directory (extracted from the zip), at
a command prompt:
Java -jar post.jar wikipedia-basic.xml
36. Indexing
Load wikipedia-basic.xml into a text editor or web browser
Load wikipedia-enhanced.xml into a text editor or browser
Load example/solr/conf/schema.xml into a text editor
37. Indexing
schema.xml defines field types and fields used in Solr
Equivalent to your database schema in a RDBMS
38. Indexing
Change these two fields in schema.xml to be of type “string”
and add multiValued=”true” for each.
<field name="links" type="string" indexed="true"
stored="true" multiValued="true"/>
<field name="category" type="string" indexed="true"
stored="true" multiValued="true"/>
39. Indexing
Now add this to the <fields> section of solrconfig.xml:
<field name="source" type="string" indexed="true"
stored="true" multiValued="false"/>
<field name="textgen" type="textgen" indexed="true"
stored="true" multiValued="true"/>
Now search for the “textgen” field type definition, further up
in the file.
40. Indexing
At the bottom of solrconfig.xml add the following:
<copyField source="text" dest="textgen"/>
41. Indexing
At your command prompt, in the dev8d directory, execute:
java -jar post.jar wikipedia-enhanced.xml
42. More Advanced Searching
http://localhost:8983/solr/select?q=computers%20AND
%20babbage&facet=true&facet.field=category&facet.mincount=
1