SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Guidelines for Managers:
What Lucene and Solr
Open Source Search
can do for
Enterprise Search
A Lucid Imagination White Paper
Abstract
Lucene/Solr is an open-source search development environment ideally suited for large-
scale, enterprise search applications. This paper provides some ways to think about your
enterprise search requirements from both technological and economic perspectives,
explains why a Lucene/Solr-based approach can be optimal, and describes how Lucid
Imagination can help you to design, develop, and deploy the necessary search solution.




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                         Page i
Table of Contents
Introduction ............................................................................................................................................................ 1
Preliminary Considerations .............................................................................................................................. 2
   Know Your Business Requirements .......................................................................................................... 3
   Know Your Data ................................................................................................................................................. 4
   Know Your Users ............................................................................................................................................... 5
Advantages of a Lucene/Solr-Based Solution ............................................................................................ 5
   Technological Advantages ............................................................................................................................. 5
   Lower Cost, Greater Flexibility .................................................................................................................... 7
How Lucid Imagination Can Help ................................................................................................................... 9
Conclusion............................................................................................................................................................. 11




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                                                                                                   Page ii
Introduction
Markets are conversations. And today, increasingly communications-rich interactions
within companies, and between companies and their stakeholders, are typically preserved
and stored, creating ever larger reserves of documents and data.
Effective access to a company’s data can be a strategic advantage of potentially enormous
value. Email, office documents, databases, customer service chat logs, content management
systems, data types representing all forms of communications in the company and with its
marketplace, continue to grow in electronic form. It’s not just that the proverbial haystack
is growing larger; it also has more types of hay, with many different types of needles to be
found.
At some point, every function in the company needs access to such data, and these needs
can vary significantly across organizations. Search technology can be a standalone system
designed to provide a single point of access to the entirety of a company’s data, irrespective
of location, container format, or owner. Or, it may provide search functionality as a
component within another application. But enabling employees, customers, partners,
investors, and other stakeholders to find the information they need when they need it is the
goal of any enterprise search solution, no matter where it will be deployed or how it will be
used.
This white paper provides some ways to approach choosing and building enterprise search
solutions, and discusses why Lucene/Solr open source search solutions supported by Lucid
Imagination present key advantages. It starts with what must be considered when
presented with an enterprise search problem, discusses some attributes of a Lucene/Solr-
based solution that could be of special significance in selecting a solution strategy, and
concludes by describing how Lucid Imagination can help to design, implement, and support
a solution that meets your organization’s needs.




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                           Page 1
Lucene and Solr are state-of-the-art search technologies available for free as open source
from The Apache Software Foundation. Lucene is a powerful search library; Solr provides a
platform built on top of Lucene that makes it easy to build Lucene-based applications1.
Both incarnations are full-featured and have excellent performance, relevancy ranking and
scalability. These technologies are used today by thousands of organizations. They power
substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM,
Netflix, LinkedIn, MySpace and many others. In many instances, Lucene/Solr solutions
regularly index and search tens or hundreds of millions of documents with sub-second
response time.


                                              Lucene and Solr power substantial
                                              and diverse search applications at AOL,
                                              CNET, Comcast Interactive Media,
                                              IBM, Netflix, LinkedIn, MySpace
                                              and many others.

Lucid Imagination is exclusively dedicated to providing robust commercial support for
Apache Lucene/Solr open source search technology. Our products and services are
designed for enterprises currently using or evaluating Lucene/Solr for their search
solutions.


Preliminary Considerations
It is not unusual to think of the Web the minute search is mentioned, and with good reason:
nowadays, even small companies can have a large Web presence, and most workers and
consumers use the Web every day.




1
  Most organizations use Solr today as their search development platform. As Lucene is the older of the two
technologies, and serves as the core of Solr’s search capabilities, we’ll refer to them together, as Lucene/Solr. For
more on the technologies, see http://www.lucidimagination.com.

What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                                                    Page 2
But even in small companies, web pages typically represent only a fraction of the
important, text-based data to which stakeholders need access. Spreadsheets, slide decks,
PDF’s, project management files, electronic design documents, chat logs and e-mail may all
contain information that will be critical in any number of business situations. Similarly,
within even small companies there can be a need to support search usage models not
typically found on Web-centric systems. For example, the ability to conduct collaborative
searches may be critical to productivity in some contexts.
To create an optimal enterprise search solution, it is essential to know your:
   •   Business requirements: What needs must be met to create competitive advantage
       for your enterprise, and how you will know when they are met?
   •   Available data: What and where is the content you have to work on, and how is it
       structured (e.g., does it form a natural “cascade” of sub-classes?)
   •   Users: What do they need to search and how will they prefer to search for it?
Along any of these dimensions, there is potentially huge variance, from one case to another.
The goal of the discussion here is not to provide an exhaustive checklist of issues. It is
intended rather to suggest the sorts of questions that should be considered at the earliest
possible stage of development.

Know Your Business Requirements
Applications for enterprise search and their associated requirements are as diverse as the
organizations that need them and the data they need to search. However, there are two
characteristics by which any search solution will ultimately be judged:
   •   Performance. Does the system return results quickly enough to fulfill the
       expectations of the critical mass of users? How does it perform under peak loads?
       Will the performance scale adequately as usage increases? Is enough known about
       probable evolution to build the system in such a way that it will sustain projected
       growth with minimum enhancement, let alone wholesale re-structuring?
       Additionally, what is the cost associated with obtaining that scale?
   •   Relevance. How well will the system find the data that the user needs, and how
       good a job will it do presenting query results in an appropriate way and in the best
       order? What techniques, implicit or explicit, are required to get user assessments of
       relevance?


What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                          Page 3
In some cases, additional criteria of success will focus on areas such as system security or
legal and regulatory compliance. However, a focus on optimizing for both performance and
relevance is essential to designing and building an effective search solution.

Know Your Data
As noted above, your search solution can include input data of any type stored in any
format or container type — ranging from project-specific program management files to
sets of database records with relevant unstructured text fields. The better you understand
the data domain of the search system, the more effective your resulting searches will be,
and the higher the probability that your system’s success metrics (starting with
performance and relevance) will be achieved.
By beginning with a data audit, you can gain an understanding of:
   •   Number and types of documents. How many documents does your system need to
       support, and how big are they, both individually and in aggregate? Answers to these
       questions will have implications for performance design and planning. Similarly,
       knowing about document types, formats, etc., is essential to ensure adequate access
       by means of file filtering or other data preparation or pre-processing steps, and so is
       crucial both for performance and relevance.
   •   Key fields. For structured or partially structured data, certain fields may carry more
       weight than others in determining relevance. For example, a document’s title may
       be assigned a higher relevancy weight than its size.
   •   Internal information structure. Even less formal documents can have key
       structural attributes Let’s illustrate by example: Imagine that the data domain of a
       search query includes the unstructured text of consumer-electronics blogs.
       Although the text itself is unstructured, the information within it may have a fair
       amount of structure, including, for example, names of manufacturers and their
       products, product capabilities such as storage or resolution, etc. The structure has a
       “shape” to it, from more general to more specific. Thus, a manufacturer’s name can
       be associated with many product names, product names with attributes or
       capabilities, etc.




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                           Page 4
Know Your Users
The human dimension of the search solution of course presents the most variables of all.
Who will your users be, and in what roles are they most likely to use the search application,
e.g., consumer, research scientist, salesperson, manager, all of the above? Of equal if not
greater importance than the “who” and “what” of your users is the “how.” Is there a need
for collaborative search or to structure a “work flow” into the search process itself? Is there
a need to provide different levels or types of access to different classes of users? Or will
your application be extracting search results to feed to another application, without
presenting it to users at all?


Advantages of a Lucene/Solr-Based Solution
We discussed above some key success criteria for enterprise search solutions primarily in
terms of the performance and results relevance of the search application. As a business-
decision maker, you may find that useful, but still a little too abstract. You are likely to ask
yourself at some point: How will my enterprise search solution help me either to make or
save money? While it is true that the Lucene and Solr software are free, there’s much more
to it than the attractive price.
Let’s take a closer look at the question of making and saving money using Lucene and Solr,
from the vantage points of both technology and economics.

Technological Advantages
Lucene is the core search library; Solr is the logical starting point for most developers
building search applications with Lucene/Solr technology for their web site, product, or
internal organizational use. Let’s look at how Solr helps you build search, and then how
Lucene executes it.
Solr is a layer of code on top of Lucene that transforms Lucene into an enterprise search
platform, and simplifies programming by extending to a broad variety of common, easier-to
use development environments. Key features include:
   •   Web services. Solr places Lucene over HTTP, allowing programs written in any
       language to invoke Lucene Search. It provides access via REST-like interfaces, or



What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                              Page 5
from a full array of open-standards based development environments, languages,
       and tools, including, for example, Python, PHP, Ruby, Ruby-on-Rails, etc.
   •   Faceting, which is the grouping of items or search results into categories that let
       users drill into search results (or even skip searching entirely) by any value in any
       field (for example, choosing different attributes of shoes at Zappos.com, or searching
       Wikipedia by sub-articles, or navigating news articles at cnet.com ).
   •   Easy configuration for managing which fields are indexed, and their
       characteristics.
   •   System administration tools for data loading, index replication, monitoring,
       logging and cache management.


                                    “How will my enterprise search solution
                                    help me either to make or save money?
                                    While it is true that the Lucene and Solr
                                    software are free, there’s much more to it
                                    than the attractive price.

Lucene, the core search engine, is a Java-based search library available for free as open
source under the Apache Software License. At the heart of the application’s “search engine,”
Lucene exhibits attributes that enable applications employing it to deliver world-class user
satisfaction. These include:
   •   Outstanding speed. Supports sub-second performance for most queries.
   •   Strong relevancy ranking and full results processing. Great out-box precision
       returns the information (documents) that users need without including a lot that
       they don’t. These results are presented clearly by relevancy, date, field, or any
       document property—and can be sorted by these attributes. Additional supported
       features, like highlighting and spell checking, let you extend search interactivity,
       making the refinement process easier and more conversational.
   •   Complete query capabilities. Offers a full array of query methods: keyword,
       Boolean and +/- queries, proximity operators, wildcards, fielded searching,
       term/field/document weights, find-similar, spell-checking, multi-lingual search, and

What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                           Page 6
more. This means that your search solution can be flexible enough to accommodate
       an enormous range of user preferences and data types, from the simplest to the
       most complex. And because Lucene/Solr are open source, users can readily tailor
       queries to very specific needs.
   •   Unsurpassed portability. Runs on any platform supporting Java, and indexes are
       portable across platforms. You can build an index on Linux and copy it to a Microsoft
       Windows machine and search it there. This makes it easy to leverage advances in
       hardware and Operating Systems while minimizing additional development costs
       for faster and better search functionality. There are also open source ports of
       Lucene for many languages besides Java, including .NET, C, Python and others.
   •   Excellent scalability. Scales from document sets of hundreds to hundreds of
       millions and beyond.
   •   Easily manageable, highly flexible deployment options. Enables “shrink-to-fit”
       deployments, ranging from single-server to fully distributed, multi-server systems,
       with its low overhead indexes and rapid incremental indexing (especially with
       versions 2.3 and later).
While no single search technology is best on each of these dimensions for every application,
Lucene is among the best out-of-the-box on all of them.
Together, Lucene and Solr provide the foundations for a search solution that is fully
capable and functionally complete. When the capabilities and attributes listed above are
essential requirements for your enterprise search needs, Lucene/Solr is a prime candidate
for fulfilling them.

Lower Cost, Greater Flexibility
When evaluating the economic advantages of a Lucene/Solr-based enterprise search
solution, it is useful to consider competing solutions from the perspective of non-recurring
and recurring costs:
   •   Non-recurring costs: Requirements gathering, system design and specification,
       system development (implementation), and testing are all more-or-less non-
       recurring costs. Another important element in this set is the cost of software
       acquisition (licensing or purchase).


What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                          Page 7
•   Recurring costs: The largest contributors here will be on-going technical and
       customer support and system administration, management, and maintenance. These
       are dependent on many factors including, for example, system size and complexity
       and number of users and their level of sophistication.


In both sets, almost all costs are associated with labor, and despite possible assertions to
the contrary, those costs are going to be approximately the same across the competing
products and technologies. All search systems, no matter how they work or who provides
them, require design and specification, development, configuration, deployment, testing,
and on-going support and maintenance.
The only inarguably clear differentiator is the cost of software acquisition. As Lucene and
Solr are open-source software solutions based on open standards and community-driven
development processes, they are free. Assuming all other costs are about equal, therefore,
the open source solution is almost certain to be highly cost efficient.
That, however, is still not the whole story. A Lucene/Solr-based solution can be the most
cost effective as well. With its strong out-of-the-box performance and relevancy ranking;
complete query capabilities; portability, scalability, and manageability characteristics; and
easy-to-use, highly-standard programming interfaces, Lucene/Solr enables you to deploy
exactly the enterprise search functionality required to fulfill completely your customers’
needs.




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                           Page 8
How Lucid Imagination Can Help
Lucid Imagination has the expertise, resources and services you need to drive development
of Lucene/Solr-based enterprise search solutions. We offer a full portfolio of software and
services including:
Certified Distributions. Because Lucene/Solr distributions certified by Lucid Imagination
are tested and commercially supported, they speed up implementation time, reduce the
risk of “gotchas”, and eliminate the need for familiarity with the fine points of the
community release process. Tested bugfixes are incorporated in organized fashion,
reducing the time needed to comb through nightly open source community releases, or
risking code forks between release cycles. The Get Started program helps users who
download our Certified Distributions with first-time installation, configuration, and basic
usage of Lucene/Solr and included utilities.


                                   “There is no substitute for “industrial
                                   strength” support to ensure your
                                   enterprise IT operation gets timely
                                   responses, so it can both meet market-
                                   driven development schedules and
                                   maintain stringent service level
                                   commitments.”

Technical Support. Although contemporary open-source solutions are typically at least as
robust and reliable as their commercial counterparts, problems can still arise. Because the
community may not focus on maintenance in timely fashion, there is no substitute for the
“industrial strength” support provided by Lucid Imagination to ensure your enterprise IT
operation gets timely responses, so it can both meet market-driven development schedules
and maintain stringent service level commitments. Designed for customers with
Lucene/Solr installations, the support subscriptions we offer include:
          o Regular updates and upgrades for Lucid-certified versions of Lucene/Solr


What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                         Page 9
o Problem isolation and diagnosis of errors in Lucene/Solr software
          o Bug patches and workarounds
          o Troubleshooting of use-case issues that may arise.
Support subscriptions are available in a variety of packages to fit different maintenance
profiles:
   •   Basic, a fit for stable deployments that can rely on minimal intervention and can
       wait a day to hear back.
   •   Professional, for deployments with quicker response time requirements, featuring
       both phone and email support.
   •   Enterprise. For mission-critical deployments requiring initial response within four
       (4) business hours on the same business day, plus an annual Search Health Check
       program.
   •   Advanced Support. Designed for customers with more demanding needs for expert
       advice and guidance on an ongoing basis, Advanced Support subscriptions include
       the services delivered under Enterprise Technical Support, plus consultative
       support to help optimize development and/or deployment efforts. Two options are
       available:
       o Development Support: As noted above, enterprise search requirements often
         are designed for deployment with enormous data domains and stringent user
         requirements. Although it may be relatively easy to construct a solution that
         works to a first-order of sophistication, when the requirements exceed more
         straightforward design goals, we can help you get to a solution that is potentially
         many times more capable for a relatively small amount of additional investment.
         We help you optimize development of Lucene/Solr enterprise search
         applications with reviews of architecture, design, code and configuration, along
         with best-of-breed methodologies and powerful tools. Includes one annual
         Search Health Check.
       o Production Support: For large, relatively complex systems that have data
         domains with continuously increasing size and complexity, on-going tuning and
         performance enhancement may be critical to ensuring sustained customer
         satisfaction We help you achieve optimal performance and availability for
         Lucene/Solr in your production environment. We provide advice on best

What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                         Page 10
practices for configuration, operations, scaling, tuning, and tools, as well as two
          annual Search Health Checks.
   •   Training. Our hands-on training programs in Lucene and Solr technologies help
       your staff acquire skills and develop expertise. Training programs are offered as
       classroom-based courses, and can be customized for on-site delivery.
   •   Consulting. Our consulting practice offers flexible-term engagements to assist you
       with high value activities such as architecture and design reviews, training,
       enablement, and best practices. As our consultants work on a broad variety of
       implementations, they are well positioned to recommend optimal approaches to
       your business and technical challenges. Their deep domain expertise can be
       retained on a project basis, over several months, ad-hoc, or as a subscription.
       Consultants are available on a remote basis or for short-term onsite work.
Our customers benefit from the years of collective expertise found in our technical staff,
who are themselves widely recognized leaders in the Lucene/Solr community. By
providing predictable, reliable resources, Lucid Imagination helps you meet your project
feature, function, and schedule requirements. We can help you reduce the risks and capture
the benefits of open source for your enterprise search solution.
We invite you to visit our Website (http://www.lucidimagination.com) for additional
details.


Conclusion
Lucene/Solr-based enterprise search solutions are among the most comprehensive,
complete, robust, and flexible in the world today. Whether you are merely contemplating
an open-source enterprise search solution or already have one deployed, Lucid Imagination
is the one company that is uniquely situated to help ensure that your customers are not
merely satisfied, but delighted in the fulfillment of their enterprise search needs.




What Lucene and Solr Open Source Search can do for Enterprise Search
A Lucid Imagination White Paper • April 2009                                           Page 11

Weitere ähnliche Inhalte

Was ist angesagt?

2011 ComputerWorld Honors Program - Award Case Study
2011 ComputerWorld Honors Program - Award Case Study2011 ComputerWorld Honors Program - Award Case Study
2011 ComputerWorld Honors Program - Award Case StudyKarthik Chakkarapani
 
2008 web-managers-hwilfert-final
2008 web-managers-hwilfert-final2008 web-managers-hwilfert-final
2008 web-managers-hwilfert-finalHallie Wilfert
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinciJohnny Lopez
 
HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)Andrey Karpov
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebAmit Sheth
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebAmit Sheth
 
Joining It All Up - KIM Legal
Joining It All Up - KIM LegalJoining It All Up - KIM Legal
Joining It All Up - KIM LegalKate Simpson
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarConcept Searching, Inc
 

Was ist angesagt? (11)

2011 ComputerWorld Honors Program - Award Case Study
2011 ComputerWorld Honors Program - Award Case Study2011 ComputerWorld Honors Program - Award Case Study
2011 ComputerWorld Honors Program - Award Case Study
 
2008 web-managers-hwilfert-final
2008 web-managers-hwilfert-final2008 web-managers-hwilfert-final
2008 web-managers-hwilfert-final
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
NYC Sem Web Meetup 20090219
NYC Sem Web Meetup 20090219NYC Sem Web Meetup 20090219
NYC Sem Web Meetup 20090219
 
HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)HPE IDOL 10 (Intelligent Data Operating Layer)
HPE IDOL 10 (Intelligent Data Operating Layer)
 
What is big data
What is big dataWhat is big data
What is big data
 
BPC10 BuckleyMetadata-share
BPC10 BuckleyMetadata-shareBPC10 BuckleyMetadata-share
BPC10 BuckleyMetadata-share
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 
Joining It All Up - KIM Legal
Joining It All Up - KIM LegalJoining It All Up - KIM Legal
Joining It All Up - KIM Legal
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right Webinar
 

Andere mochten auch

Updated: Marketing your Technology
Updated: Marketing your TechnologyUpdated: Marketing your Technology
Updated: Marketing your TechnologyMarty Kaszubowski
 
Ingles haiti
Ingles haitiIngles haiti
Ingles haititanica
 
Shining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringShining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringLucidworks (Archived)
 
Jonh Lennon
Jonh LennonJonh Lennon
Jonh Lennontanica
 
Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpLucidworks (Archived)
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarksontanica
 
Sudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpSudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpLucidworks (Archived)
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2彰 村地
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...Lucidworks (Archived)
 
C:\Fakepath\I Love You Mommy
C:\Fakepath\I Love You MommyC:\Fakepath\I Love You Mommy
C:\Fakepath\I Love You MommyNyiah
 
In The Annals Of Rock History The Who
In The Annals Of Rock History The WhoIn The Annals Of Rock History The Who
In The Annals Of Rock History The Whotanica
 

Andere mochten auch (20)

Updated: Marketing your Technology
Updated: Marketing your TechnologyUpdated: Marketing your Technology
Updated: Marketing your Technology
 
Ingles haiti
Ingles haitiIngles haiti
Ingles haiti
 
Shining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringShining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoring
 
Jonh Lennon
Jonh LennonJonh Lennon
Jonh Lennon
 
Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it Up
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarkson
 
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
 
Sudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpSudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ Yelp
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
Maroon5
Maroon5Maroon5
Maroon5
 
Open Source Search Applications
Open Source Search ApplicationsOpen Source Search Applications
Open Source Search Applications
 
Cmd Training Institute - New Premises
Cmd Training Institute - New PremisesCmd Training Institute - New Premises
Cmd Training Institute - New Premises
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
C:\Fakepath\I Love You Mommy
C:\Fakepath\I Love You MommyC:\Fakepath\I Love You Mommy
C:\Fakepath\I Love You Mommy
 
In The Annals Of Rock History The Who
In The Annals Of Rock History The WhoIn The Annals Of Rock History The Who
In The Annals Of Rock History The Who
 
Customized Navigation Using SOLR
Customized Navigation Using SOLRCustomized Navigation Using SOLR
Customized Navigation Using SOLR
 

Ähnlich wie Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search

What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchLucidworks (Archived)
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchLucidworks (Archived)
 
3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & AnalyticsThe Digital Group
 
Looker Product Overview
Looker Product OverviewLooker Product Overview
Looker Product OverviewAlex Mehrtens
 
The Accounting Integration Platform Permits
The Accounting Integration Platform PermitsThe Accounting Integration Platform Permits
The Accounting Integration Platform PermitsJennifer Letterman
 
Software Requirements Specification
Software Requirements SpecificationSoftware Requirements Specification
Software Requirements SpecificationSrishti Sabarwal
 
E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...Lucidworks (Archived)
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
 
Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)Olivia Jones
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid searchIntranätverk
 
Situational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategySituational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategyNewton Day Uploads
 
Final Project Business ReportNAMESo.docx
Final Project Business ReportNAMESo.docxFinal Project Business ReportNAMESo.docx
Final Project Business ReportNAMESo.docxmydrynan
 
Open standards for enterprise applications
Open standards for enterprise applicationsOpen standards for enterprise applications
Open standards for enterprise applicationsKumar
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Findwise
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
5 Biggest Blunders With Enterprise Social Software
5 Biggest Blunders With Enterprise Social Software5 Biggest Blunders With Enterprise Social Software
5 Biggest Blunders With Enterprise Social SoftwareBlue Economy Agency
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeDataWorks Summit
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitIsaac Mosquera
 

Ähnlich wie Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search (20)

What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics
 
Looker Product Overview
Looker Product OverviewLooker Product Overview
Looker Product Overview
 
The Accounting Integration Platform Permits
The Accounting Integration Platform PermitsThe Accounting Integration Platform Permits
The Accounting Integration Platform Permits
 
Software Requirements Specification
Software Requirements SpecificationSoftware Requirements Specification
Software Requirements Specification
 
E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
 
Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid search
 
Situational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategySituational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategy
 
Final Project Business ReportNAMESo.docx
Final Project Business ReportNAMESo.docxFinal Project Business ReportNAMESo.docx
Final Project Business ReportNAMESo.docx
 
Open standards for enterprise applications
Open standards for enterprise applicationsOpen standards for enterprise applications
Open standards for enterprise applications
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
5 Biggest Blunders With Enterprise Social Software
5 Biggest Blunders With Enterprise Social Software5 Biggest Blunders With Enterprise Social Software
5 Biggest Blunders With Enterprise Social Software
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop Summit
 

Mehr von Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 

Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search

  • 1. Guidelines for Managers: What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper
  • 2. Abstract Lucene/Solr is an open-source search development environment ideally suited for large- scale, enterprise search applications. This paper provides some ways to think about your enterprise search requirements from both technological and economic perspectives, explains why a Lucene/Solr-based approach can be optimal, and describes how Lucid Imagination can help you to design, develop, and deploy the necessary search solution. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page i
  • 3. Table of Contents Introduction ............................................................................................................................................................ 1 Preliminary Considerations .............................................................................................................................. 2 Know Your Business Requirements .......................................................................................................... 3 Know Your Data ................................................................................................................................................. 4 Know Your Users ............................................................................................................................................... 5 Advantages of a Lucene/Solr-Based Solution ............................................................................................ 5 Technological Advantages ............................................................................................................................. 5 Lower Cost, Greater Flexibility .................................................................................................................... 7 How Lucid Imagination Can Help ................................................................................................................... 9 Conclusion............................................................................................................................................................. 11 What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page ii
  • 4. Introduction Markets are conversations. And today, increasingly communications-rich interactions within companies, and between companies and their stakeholders, are typically preserved and stored, creating ever larger reserves of documents and data. Effective access to a company’s data can be a strategic advantage of potentially enormous value. Email, office documents, databases, customer service chat logs, content management systems, data types representing all forms of communications in the company and with its marketplace, continue to grow in electronic form. It’s not just that the proverbial haystack is growing larger; it also has more types of hay, with many different types of needles to be found. At some point, every function in the company needs access to such data, and these needs can vary significantly across organizations. Search technology can be a standalone system designed to provide a single point of access to the entirety of a company’s data, irrespective of location, container format, or owner. Or, it may provide search functionality as a component within another application. But enabling employees, customers, partners, investors, and other stakeholders to find the information they need when they need it is the goal of any enterprise search solution, no matter where it will be deployed or how it will be used. This white paper provides some ways to approach choosing and building enterprise search solutions, and discusses why Lucene/Solr open source search solutions supported by Lucid Imagination present key advantages. It starts with what must be considered when presented with an enterprise search problem, discusses some attributes of a Lucene/Solr- based solution that could be of special significance in selecting a solution strategy, and concludes by describing how Lucid Imagination can help to design, implement, and support a solution that meets your organization’s needs. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 1
  • 5. Lucene and Solr are state-of-the-art search technologies available for free as open source from The Apache Software Foundation. Lucene is a powerful search library; Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based applications1. Both incarnations are full-featured and have excellent performance, relevancy ranking and scalability. These technologies are used today by thousands of organizations. They power substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace and many others. In many instances, Lucene/Solr solutions regularly index and search tens or hundreds of millions of documents with sub-second response time. Lucene and Solr power substantial and diverse search applications at AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace and many others. Lucid Imagination is exclusively dedicated to providing robust commercial support for Apache Lucene/Solr open source search technology. Our products and services are designed for enterprises currently using or evaluating Lucene/Solr for their search solutions. Preliminary Considerations It is not unusual to think of the Web the minute search is mentioned, and with good reason: nowadays, even small companies can have a large Web presence, and most workers and consumers use the Web every day. 1 Most organizations use Solr today as their search development platform. As Lucene is the older of the two technologies, and serves as the core of Solr’s search capabilities, we’ll refer to them together, as Lucene/Solr. For more on the technologies, see http://www.lucidimagination.com. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 2
  • 6. But even in small companies, web pages typically represent only a fraction of the important, text-based data to which stakeholders need access. Spreadsheets, slide decks, PDF’s, project management files, electronic design documents, chat logs and e-mail may all contain information that will be critical in any number of business situations. Similarly, within even small companies there can be a need to support search usage models not typically found on Web-centric systems. For example, the ability to conduct collaborative searches may be critical to productivity in some contexts. To create an optimal enterprise search solution, it is essential to know your: • Business requirements: What needs must be met to create competitive advantage for your enterprise, and how you will know when they are met? • Available data: What and where is the content you have to work on, and how is it structured (e.g., does it form a natural “cascade” of sub-classes?) • Users: What do they need to search and how will they prefer to search for it? Along any of these dimensions, there is potentially huge variance, from one case to another. The goal of the discussion here is not to provide an exhaustive checklist of issues. It is intended rather to suggest the sorts of questions that should be considered at the earliest possible stage of development. Know Your Business Requirements Applications for enterprise search and their associated requirements are as diverse as the organizations that need them and the data they need to search. However, there are two characteristics by which any search solution will ultimately be judged: • Performance. Does the system return results quickly enough to fulfill the expectations of the critical mass of users? How does it perform under peak loads? Will the performance scale adequately as usage increases? Is enough known about probable evolution to build the system in such a way that it will sustain projected growth with minimum enhancement, let alone wholesale re-structuring? Additionally, what is the cost associated with obtaining that scale? • Relevance. How well will the system find the data that the user needs, and how good a job will it do presenting query results in an appropriate way and in the best order? What techniques, implicit or explicit, are required to get user assessments of relevance? What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 3
  • 7. In some cases, additional criteria of success will focus on areas such as system security or legal and regulatory compliance. However, a focus on optimizing for both performance and relevance is essential to designing and building an effective search solution. Know Your Data As noted above, your search solution can include input data of any type stored in any format or container type — ranging from project-specific program management files to sets of database records with relevant unstructured text fields. The better you understand the data domain of the search system, the more effective your resulting searches will be, and the higher the probability that your system’s success metrics (starting with performance and relevance) will be achieved. By beginning with a data audit, you can gain an understanding of: • Number and types of documents. How many documents does your system need to support, and how big are they, both individually and in aggregate? Answers to these questions will have implications for performance design and planning. Similarly, knowing about document types, formats, etc., is essential to ensure adequate access by means of file filtering or other data preparation or pre-processing steps, and so is crucial both for performance and relevance. • Key fields. For structured or partially structured data, certain fields may carry more weight than others in determining relevance. For example, a document’s title may be assigned a higher relevancy weight than its size. • Internal information structure. Even less formal documents can have key structural attributes Let’s illustrate by example: Imagine that the data domain of a search query includes the unstructured text of consumer-electronics blogs. Although the text itself is unstructured, the information within it may have a fair amount of structure, including, for example, names of manufacturers and their products, product capabilities such as storage or resolution, etc. The structure has a “shape” to it, from more general to more specific. Thus, a manufacturer’s name can be associated with many product names, product names with attributes or capabilities, etc. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 4
  • 8. Know Your Users The human dimension of the search solution of course presents the most variables of all. Who will your users be, and in what roles are they most likely to use the search application, e.g., consumer, research scientist, salesperson, manager, all of the above? Of equal if not greater importance than the “who” and “what” of your users is the “how.” Is there a need for collaborative search or to structure a “work flow” into the search process itself? Is there a need to provide different levels or types of access to different classes of users? Or will your application be extracting search results to feed to another application, without presenting it to users at all? Advantages of a Lucene/Solr-Based Solution We discussed above some key success criteria for enterprise search solutions primarily in terms of the performance and results relevance of the search application. As a business- decision maker, you may find that useful, but still a little too abstract. You are likely to ask yourself at some point: How will my enterprise search solution help me either to make or save money? While it is true that the Lucene and Solr software are free, there’s much more to it than the attractive price. Let’s take a closer look at the question of making and saving money using Lucene and Solr, from the vantage points of both technology and economics. Technological Advantages Lucene is the core search library; Solr is the logical starting point for most developers building search applications with Lucene/Solr technology for their web site, product, or internal organizational use. Let’s look at how Solr helps you build search, and then how Lucene executes it. Solr is a layer of code on top of Lucene that transforms Lucene into an enterprise search platform, and simplifies programming by extending to a broad variety of common, easier-to use development environments. Key features include: • Web services. Solr places Lucene over HTTP, allowing programs written in any language to invoke Lucene Search. It provides access via REST-like interfaces, or What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 5
  • 9. from a full array of open-standards based development environments, languages, and tools, including, for example, Python, PHP, Ruby, Ruby-on-Rails, etc. • Faceting, which is the grouping of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field (for example, choosing different attributes of shoes at Zappos.com, or searching Wikipedia by sub-articles, or navigating news articles at cnet.com ). • Easy configuration for managing which fields are indexed, and their characteristics. • System administration tools for data loading, index replication, monitoring, logging and cache management. “How will my enterprise search solution help me either to make or save money? While it is true that the Lucene and Solr software are free, there’s much more to it than the attractive price. Lucene, the core search engine, is a Java-based search library available for free as open source under the Apache Software License. At the heart of the application’s “search engine,” Lucene exhibits attributes that enable applications employing it to deliver world-class user satisfaction. These include: • Outstanding speed. Supports sub-second performance for most queries. • Strong relevancy ranking and full results processing. Great out-box precision returns the information (documents) that users need without including a lot that they don’t. These results are presented clearly by relevancy, date, field, or any document property—and can be sorted by these attributes. Additional supported features, like highlighting and spell checking, let you extend search interactivity, making the refinement process easier and more conversational. • Complete query capabilities. Offers a full array of query methods: keyword, Boolean and +/- queries, proximity operators, wildcards, fielded searching, term/field/document weights, find-similar, spell-checking, multi-lingual search, and What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 6
  • 10. more. This means that your search solution can be flexible enough to accommodate an enormous range of user preferences and data types, from the simplest to the most complex. And because Lucene/Solr are open source, users can readily tailor queries to very specific needs. • Unsurpassed portability. Runs on any platform supporting Java, and indexes are portable across platforms. You can build an index on Linux and copy it to a Microsoft Windows machine and search it there. This makes it easy to leverage advances in hardware and Operating Systems while minimizing additional development costs for faster and better search functionality. There are also open source ports of Lucene for many languages besides Java, including .NET, C, Python and others. • Excellent scalability. Scales from document sets of hundreds to hundreds of millions and beyond. • Easily manageable, highly flexible deployment options. Enables “shrink-to-fit” deployments, ranging from single-server to fully distributed, multi-server systems, with its low overhead indexes and rapid incremental indexing (especially with versions 2.3 and later). While no single search technology is best on each of these dimensions for every application, Lucene is among the best out-of-the-box on all of them. Together, Lucene and Solr provide the foundations for a search solution that is fully capable and functionally complete. When the capabilities and attributes listed above are essential requirements for your enterprise search needs, Lucene/Solr is a prime candidate for fulfilling them. Lower Cost, Greater Flexibility When evaluating the economic advantages of a Lucene/Solr-based enterprise search solution, it is useful to consider competing solutions from the perspective of non-recurring and recurring costs: • Non-recurring costs: Requirements gathering, system design and specification, system development (implementation), and testing are all more-or-less non- recurring costs. Another important element in this set is the cost of software acquisition (licensing or purchase). What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 7
  • 11. Recurring costs: The largest contributors here will be on-going technical and customer support and system administration, management, and maintenance. These are dependent on many factors including, for example, system size and complexity and number of users and their level of sophistication. In both sets, almost all costs are associated with labor, and despite possible assertions to the contrary, those costs are going to be approximately the same across the competing products and technologies. All search systems, no matter how they work or who provides them, require design and specification, development, configuration, deployment, testing, and on-going support and maintenance. The only inarguably clear differentiator is the cost of software acquisition. As Lucene and Solr are open-source software solutions based on open standards and community-driven development processes, they are free. Assuming all other costs are about equal, therefore, the open source solution is almost certain to be highly cost efficient. That, however, is still not the whole story. A Lucene/Solr-based solution can be the most cost effective as well. With its strong out-of-the-box performance and relevancy ranking; complete query capabilities; portability, scalability, and manageability characteristics; and easy-to-use, highly-standard programming interfaces, Lucene/Solr enables you to deploy exactly the enterprise search functionality required to fulfill completely your customers’ needs. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 8
  • 12. How Lucid Imagination Can Help Lucid Imagination has the expertise, resources and services you need to drive development of Lucene/Solr-based enterprise search solutions. We offer a full portfolio of software and services including: Certified Distributions. Because Lucene/Solr distributions certified by Lucid Imagination are tested and commercially supported, they speed up implementation time, reduce the risk of “gotchas”, and eliminate the need for familiarity with the fine points of the community release process. Tested bugfixes are incorporated in organized fashion, reducing the time needed to comb through nightly open source community releases, or risking code forks between release cycles. The Get Started program helps users who download our Certified Distributions with first-time installation, configuration, and basic usage of Lucene/Solr and included utilities. “There is no substitute for “industrial strength” support to ensure your enterprise IT operation gets timely responses, so it can both meet market- driven development schedules and maintain stringent service level commitments.” Technical Support. Although contemporary open-source solutions are typically at least as robust and reliable as their commercial counterparts, problems can still arise. Because the community may not focus on maintenance in timely fashion, there is no substitute for the “industrial strength” support provided by Lucid Imagination to ensure your enterprise IT operation gets timely responses, so it can both meet market-driven development schedules and maintain stringent service level commitments. Designed for customers with Lucene/Solr installations, the support subscriptions we offer include: o Regular updates and upgrades for Lucid-certified versions of Lucene/Solr What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 9
  • 13. o Problem isolation and diagnosis of errors in Lucene/Solr software o Bug patches and workarounds o Troubleshooting of use-case issues that may arise. Support subscriptions are available in a variety of packages to fit different maintenance profiles: • Basic, a fit for stable deployments that can rely on minimal intervention and can wait a day to hear back. • Professional, for deployments with quicker response time requirements, featuring both phone and email support. • Enterprise. For mission-critical deployments requiring initial response within four (4) business hours on the same business day, plus an annual Search Health Check program. • Advanced Support. Designed for customers with more demanding needs for expert advice and guidance on an ongoing basis, Advanced Support subscriptions include the services delivered under Enterprise Technical Support, plus consultative support to help optimize development and/or deployment efforts. Two options are available: o Development Support: As noted above, enterprise search requirements often are designed for deployment with enormous data domains and stringent user requirements. Although it may be relatively easy to construct a solution that works to a first-order of sophistication, when the requirements exceed more straightforward design goals, we can help you get to a solution that is potentially many times more capable for a relatively small amount of additional investment. We help you optimize development of Lucene/Solr enterprise search applications with reviews of architecture, design, code and configuration, along with best-of-breed methodologies and powerful tools. Includes one annual Search Health Check. o Production Support: For large, relatively complex systems that have data domains with continuously increasing size and complexity, on-going tuning and performance enhancement may be critical to ensuring sustained customer satisfaction We help you achieve optimal performance and availability for Lucene/Solr in your production environment. We provide advice on best What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 10
  • 14. practices for configuration, operations, scaling, tuning, and tools, as well as two annual Search Health Checks. • Training. Our hands-on training programs in Lucene and Solr technologies help your staff acquire skills and develop expertise. Training programs are offered as classroom-based courses, and can be customized for on-site delivery. • Consulting. Our consulting practice offers flexible-term engagements to assist you with high value activities such as architecture and design reviews, training, enablement, and best practices. As our consultants work on a broad variety of implementations, they are well positioned to recommend optimal approaches to your business and technical challenges. Their deep domain expertise can be retained on a project basis, over several months, ad-hoc, or as a subscription. Consultants are available on a remote basis or for short-term onsite work. Our customers benefit from the years of collective expertise found in our technical staff, who are themselves widely recognized leaders in the Lucene/Solr community. By providing predictable, reliable resources, Lucid Imagination helps you meet your project feature, function, and schedule requirements. We can help you reduce the risks and capture the benefits of open source for your enterprise search solution. We invite you to visit our Website (http://www.lucidimagination.com) for additional details. Conclusion Lucene/Solr-based enterprise search solutions are among the most comprehensive, complete, robust, and flexible in the world today. Whether you are merely contemplating an open-source enterprise search solution or already have one deployed, Lucid Imagination is the one company that is uniquely situated to help ensure that your customers are not merely satisfied, but delighted in the fulfillment of their enterprise search needs. What Lucene and Solr Open Source Search can do for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 11