Shalin Shekhar Mangar gave an introduction to Apache Lucene and Solr at the 4th Bangalore Lucene/Solr Meetup on April 19th, 2014. He provided an overview of Lucene as a Java-based search library for adding search and indexing to applications. He then discussed Solr, which is based on Lucene and allows accessing Lucene via HTTP with features like faceting, replication, and distributed search. He also demonstrated indexing and searching with Solr using its Java client SolrJ.
2. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Who am I?
● Apache Lucene/Solr Committer and PMC member
● Contributor since January 2008
● Currently: Engineer at LucidWorks
● Formerly with AOL
● Email: shalin@apache.org
● Twitter: shalinmangar
● Blog: http://shal.in
3. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Apache Lucene
● http://lucene.apache.org/java
● Java based API for adding search and indexing to your
applications
● High performance indexing – over 150GB/hour on modern
hardware
● Fast and efficient scoring and indexing algorithms
● Support for multiple query types, hit highlighting, faceting,
joins, grouping, typo-tolerant suggestions and multiple
languages
● Most widely deployed search library on the planet
6. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Lucene Query Syntax
● +red +shoes = red AND shoes
●
+shoes -red = shoes NOT red
●
“android phone”
● “android phone” -samsung = “android phone” NOT samsung
●
“android samsung”~4
● merced*
●
createDate:[201301 TO 201401]
● author:shalin
● author:”shalin mangar”
●
author:”shalin mangar” AND project:(lucene OR solr)
● title:samsung^5 category:phone
7. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Apache Solr
● http://lucene.apache.org/solr
● Lucene based search server + other features
● Access Lucene over HTTP:
– Java, Ruby, Python, .NET, PHP over XML/JSON and other formats
● Most programming tasks in Lucene are configuration tasks in
Solr
● Faceting (guided navigation, filters etc)
● Replication and distributed search
● Lucene best practices
8. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Other features
● Data Import Handler
– Index Databases, Mails, RSS, XMLs etc.
● Rich document support
– PDF, MS Office, Images etc
● Replication for high query volume
● Distributed search for large indexes
– Production systems with 1B+ documents
● Very extensible and customizable
– Embedded in commercial search products from LucidWorks, DataStax,
Cloudera, Hortonworks, Amazon CloudSearch and Riak
18. 4th
Bangalore Lucene/Solr Meetup
19th
April 2014
Bangalore Baby Apache Solr Meetup Group
● http://www.meetup.com/Bangalore-Baby-Apache-Solr-Group/
● Already had one successful meetup
● Great tutorial + hands-on workshop
● Must join for all new comers
● Planning to have another meetup next month