SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Search Readiness
Checklist:
Moving to
Solr/Lucene Open
Source Search
A Lucid Imagination White Paper
Abstract
Search was once considered a black-box application that ingested content and delivered results to users
opaquely. However, driven by the opportunities and demands of the growing universe of content and by
the versatility of Solr/Lucene open source search technology, search applications are evolving from a
standalone facility to an enabling framework.
Good search is hard. While the basics of search technology can be deceptively simple, the art and science
of applying that technology to relevant business and content processing problems is daunting. By its very
nature, search can span an almost infinite variety of content, formats, subject matter, relevancy criteria,
and more.
This Open Source Search Readiness Checklist is organized into four broad categories:
       Why do you need a search application?
       What are the key technical characteristics of your search application?
   

       What is your search application’s technology environment?
   

       How can you ensure the best fit between Solr/Lucene and your ongoing business needs?
   

Each category details key issues to consider in moving to open source search. Whether you are
   

undertaking a new search application or have a working search application running on a platform you
are considering leaving behind, this checklist provides a working foundation to help you make the
transition smoothly.
Working with Lucid Imagination, the commercial company for Solr/Lucene open source search
technology, offers you packaged solutions that simplify and streamline search application development;
lower the cost of growth through flexible, adaptable architecture; and deliver reliable backing of
unmatched expertise in enterprise search and open source.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                               Page i
Contents
Introduction ........................................................................................................................................................................................... 1
I.       Why Do You Need a Search Application?........................................................................................................................... 2
II.      What Are the Key Technical Characteristics of Your Search Application? .......................................................... 5
III . What Is the Technology Environment in Which You Are Building Your Search Application? ...................... 9
IV.      How can you ensure fit between Solr/Lucene and your ongoing business needs? ........................................ 13
Summary of Questions...................................................................................................................................................................... 16
About Lucid Imagination ................................................................................................................................................................. 17
Recommended Reading ................................................................................................................................................................... 17
Appendix: Solr/Lucene Features and Benefits ........................................................................................................................ 18




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                                                                                                  Page ii
Introduction
Whether you are undertaking a new search application or have a working search application running on
a platform you are considering leaving behind, there are a lot of questions you’ll need to answer to be
prepared for the effort.
Good search is hard. While the basics of search technology can be deceptively simple, the art and science
of applying that technology to relevant business and content processing problems are daunting. By its
very nature, search can span an almost infinite variety of content, formats, subject matter, relevancy
criteria, and more. Add in the fact that there are almost as many ways to judge relevant results as there
are individual end users, and you can see the challenge.
This Open Source Readiness Checklist is organized into four broad categories, each with a discussion of
the issues and opportunities you’ll need to consider as you prepare for your search application. Where
applicable, we’ll provide additional references for further study or research.
        Why do you need a search application?
        What are the key technical characteristics of your search application?
    

        What is your search application’s technology environment?
    

        How can you ensure the best fit between Solr/Lucene and your ongoing business needs?
    

This guide is not intended to replace a design strategy, architectural rigor, or a formal requirements
    

document. By considering answers for the issues it sets forth, we believe you’ll be better prepared for
getting your Solr/Lucene application up and running.
If you are replacing a legacy commercial platform, you may wonder: Can Solr/Lucene be a complete
search platform if you can’t just “drop it in” and replace what you now have, function-for-function,
feature for feature? Consider first that, owing to the great variation of search problems, search
technology providers have historically taken different approaches to developing their own toolkit: An
effort to imitate one with the other will not cut it. We believe you will be best served by a fresh look at the
problem search was meant to solve, unburdened by the details of prior implementations. More
importantly, the flexibility and adaptive nature of Solr/Lucene open source will both enable immediate
transition and lay the foundation for evolving your application to meet emerging needs.
The key measure of readiness for the transition is a solid grip on the value of the effort. Lucid
Imagination’s customers report that Solr/Lucene technology delivers tremendous benefits in flexibility,
result quality, performance—and most importantly, an ability to control their business and technology
destiny with search. Those same customers use Lucid Imagination’s services and solutions to lock in
those gains, and cement the competitive advantage achieved with Solr/Lucene.
We believe an understanding of these advantages will lead you to apply Solr/Lucene most effectively, and
identify where it is that Lucid Imagination can help you design, develop, and deploy your search
application with confidence.

Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                  Page 1
In understanding the motivation behind your search application, consider how best to align three factors:
I.     Why Do You Need a Search Application?
your users, your data, and your business objectives.
When you build a search application, you face end users with expectations driven by their experience
with the large consumer search engines on the public Internet, such as Google, Bing, and Yahoo. Certainly,
the billions of dollars spent on billions of end users searching trillions of documents have delivered
broad-ranging innovations.
It’s a fundamentally different proposition to build your own search application. Internet searches may
produce millions of results in milliseconds, but they rely on measures like website popularity or on URLs
and domain names—not generally applicable to purpose-built applications for businesses. Relying on
generalized relevancy for a global population of all Internet users, the big Internet search engines are not
tied to your business rules, business process logic, or the opportunity cost of improved precision for your
specific set of data or your search users—and their business interests are not yours.
Retrieval of unstructured, heterogeneous documents and data is where
Lucene/Solr search technology excels. Much of that data has been
stored in a relational database, which offer robust storage and stability,
                                                                             RECOMMENDED READING:


but its query and retrieval model is ill-suited to the more varied,
dynamic modern data landscape.
                                                                                 Starting a Search
                                                                                  Application

Solr/Lucene search technology offers extraordinarily
                                                                                  Marc Krellenstein, CTO and


broad applicability, flexibility, scalability, and adaptability. Open source
                                                                                  Founder, Lucid Imagination


provenance contributes directly to those benefits in many ways. It
                                                                                 The Case for Lucene/Solr:


provides a broad community of professional developers, testing and
                                                                                  Real World Open Source


perfecting the technology against tremendous variation in use cases, as
                                                                                  Search Applications


well as changes and improvements that are strictly peer-reviewed,
                                                                                  A Lucid Imagination White


creating a broad foundation of innovation and rigorous peer review.
                                                                                  Paper


Not to mention faceting, geo-search, numeric range queries, speed and
scalability into the billions of documents, near-real-time indexing,
and many more innovations that have broken barriers to building effective search applications.
Another great capability inherent in the Solr/Lucene platform is anticipating the future needs of the
broad range of users. With adaptive and editorial boosting relevancy techniques, query corrections and
suggestions, recommended results, and faceted search, search applications built with Solr/Lucene help
your business control the quality of experience between your users and your data—and fit that
experience to your business objectives.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                               Page 2
Free software, such as Lucene and Solr open source search, does not mean search is free of effort. If
1. What business objectives are (or should be) achieved with your search application?

   your search project is successful, consider how you will prove it: Which of these would you be able to
   point to?
       (a)   Save money? How much or how much more?
       (b)   Save time? How much or how much more?
       (c)   Increase revenue? How much or how much more?
       (d)   Increase end user satisfaction? Which ones?
       (e)   Create advantage over competitors?
       (f)   Decrease risk? How much or how much more?
       (g)   More than one of the above?


   Most organizations have a system for finding information, often a legacy commercial search system.
2. What objectives are (or are not) being met with your current search implementation?

   Why is it unsatisfactory? If you were to replace or improve it, which of the results in the previous
   question would it affect? By how much?


   Which of the following properties of your search application (one or more) would have the most
3. Which improvements in search behavior contribute to improved business results?

   impact on the business results you are looking for?
       (a)   Speed with which new content is available.
       (b)   Likelihood the user’s chosen result is in the top n results returned.
       (c)   Completeness of the full set of results the system delivers.
       (d)   Speed with which queries deliver result sets.
       (e)   Flexibility with which the system handles different types of queries.
       (f)   Ability of the system to never deliver “zero” results.
       (g)   Ranking of particular results for particular queries.
       (h)   Reduced effort required for users to find previously unknown content.
       (i)   Likelihood the user will return to use the search system again and again.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                             Page 3
Within the realm of search behaviors, special attention needs to be paid to the control of search
4. How much control do you need over the results that end users see?

   results. Often, the application of algorithms, business rules, and access rights tie directly to the
   economic benefits of search. Solr/Lucene offer great depth in this dimension. The previous question
   asked about general changes in search behavior; here, consider specifically how important direct
   control of results is to the success of the application.
       (a) Do you need to adjust the likelihood that particular results or documents appear at a certain
           time, or in relationship to other results?
       (b) Are there certain documents or data that should be delivered to certain users, but proscribed
           from others?
       (c) Are there algorithms that you need the system to account for programmatically, in automated
           fashion during the course of search, such as performing probability calculations?
       (d) How important is it that you understand why the search returned a particular set of results,
           and be able to adjust the search behavior as a result?


   The behavior of your search application will be judged by its end users; how much do you know
5. How much do your end users know about the content they are searching for?

   about those users and the queries they are likely to submit? Consider the following contrasts. Are
   your end users likely to:
       (a) Express their queries in terms or phrases that will narrow in on results quickly, or submit
           broad, general words that retrieve broad results?
       (b) Spell the terms they are searching for correctly?
       (c) Search for known results in an unknown location (e.g., “Find the e-mail I sent to Carol on
           Tuesday, August 10” )? Or undertake a search without knowing which content they might
           find?
       (d) Browse through interim sets of results in order to narrow or refine their search queries?
       (e) Specify quantitative parameters, such as distances, prices, locations, or dates, as part of their
           search?
       (f) Use logic-oriented language (e.g., Boolean queries or wildcard characters) or natural
           language?




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                Page 4
II.    What Are the Key Technical
       Characteristics of Your Search Application?
Given the flexibility and broad applicability of Solr/Lucene open source search technology, there is a rich
set of design decisions to be made in setting up the application to meet your business objectives within
the scope of your technology. In this section, we’ll explore some of the key inputs you’ll need to consider
before you begin the exercise of architecture and design of your search application. In most, if not all, of
the permutations of search needs implied by the questions below, the flexibility of Solr/Lucene search
can address your needs.
It’s important to note that these questions are not intended to replace a formal design process or
substitute for rigorous architectural assessment of how you can use Solr/Lucene to build a successful
search application. Rather, it will help establish your intent with respect to key functional and system
behaviors.
More than in the previous sections, you may find that the answers to
the scoping questions below change over time. As you familiarize
                                                                            RECOMMENDED READING:


yourself with the capabilities and possibilities available with the
Solr/Lucene search platform, you may well want to refine or revise
                                                                                 Faceted Search with Solr


your understanding of what constitutes desired behavior.
                                                                                  Yonik Seeley, creator of
                                                                                  Apache Solr and co-founder


Often, organizations build a working prototype of their search
                                                                                  of Lucid Imagination


application in order to validate the assumptions, as well as the design
                                                                                 Optimizing Findability in


and implementation of the system intended to put those assumptions
                                                                                  Lucene and Solr


into action. While there are many nuances to formal development
                                                                                  Grant Ingersoll, Chair,


methodologies that exploit this discover-by-doing effect, they share a
                                                                                  Apache Lucene PMC and co-
                                                                                  founder of Lucid


common pattern of implementation, iteration, learning, improvement,
                                                                                  Imagination


and change.
It is strongly recommended that you consider at least two sets of answers to the questions below; first for
a prototype implementation, and perhaps one or more revisions of that implementation going forward,
once you accumulate experience and discover the full range of possibilities.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                Page 5
Much as documents and data can live in different repositories, they come packaged in different
1. In hat formats are the documents and data you will search?

   formats, based on where they originated and who created them. A good understanding of these
   formats enables successful content processing for search. Different format types require different
   levels of interpretation and composition to separate out searchable text content and metadata
   (information about the document or its content), which can inform a search, from visual presentation
   details such as colors, fonts, or software-specific content. For each of the formats, there are further
   considerations of version; to cite just one example, the formatting and file structure of Microsoft
   Word 97 *.doc documents differs from the Office 2007 *.docx version.
   Solr/Lucene can leverage a range of tools—built-in as well as extensions, including both open source
   and commercial source. Which of the following document format types will you be indexing and
   searching?
       (a)   XML documents
       (b)   Database records
       (c)   HTML documents
       (d)   Microsoft office documents: *.doc or *.docx for Word; *.ppt or *.pptx for Powerpoint; *.xls or
             *.xlsx for Excel
       (e)   PDF documents
       (f)   CSV (comma separated values) or TSVs (tab separated values)
       (g)   Open Office documents
       (h)   Engineering drawings from CAD/CAM/CAE systems
       (i)   Others


   Configuring your search system requires an understanding of your document sizes, as performance
2. Document collection composition: how big are documents?

   and throughput depend heavily on accounting for the size of documents to be indexed. What
   percentage or fraction of your documents are:
       (a) Under 1 KB                                         (f)   5 MB to 10 MB
       (b) 1 KB to 100 KB                                     (g)   10 MB to 50 MB
       (c) 100 KB to 500 KB                                   (h)   50 MB to 100 MB
       (d) 500 KB to 1 MB                                     (i)   100 KB to 250 MB
       (e) 1 MB to 5 MB                                       (j)   250 MB and up




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                Page 6
3. Howmuch new content do you presently add per unit time?

   The quality of your search results can be affected by the interval between when a document is
   How many documents are updated per unit time?

   complete or ready, and when it appears in the index for searching.
       (a) Millions of very small documents—in the form of tweets, comments, messages, log files, etc.—
           appear continuously as users or systems create these content snippets.
       (b) Existing documents are revised, either by users, or by machines—in the latter case, examples
           such as reports and data output indexed by your search application.
       (c) New documents are available less frequently, perhaps even on a regular schedule, which in
           turn drives user expectations of when they can be searched.
       (d) Changes to content come in particular windows, busier at some times than others.
   Consider the question of change to your collection in two ways: First, at what interval does the
   amount of content in your collection change? Second, what fraction of the total documents are you
   adding to the overall collection within each interval?
       (a) From minute to minute                           (e) Daily
       (b) About to four times per hour                    (f) Weekly
       (c) No more than two per hour                       (g) Monthly
       (d) No more once every 4 hours


   Consider the population of users who drive your search application. How many are they, and what
4. What is the rate of queries you expect from your user population?

   number of queries might be submitted? Consider especially that queries in the search application do
   not always map one-to-one with a single string entered by a user in a search box. Use these questions
   to characterize how many queries your search application will need to handle per unit time, typically
   in queries per second.
       (a) How often do they need access to the application?
       (b) Will they submit queries one at a time on an occasional or ad-hoc basis, or will they rely on
           the search application for continuous constant use?
       (c) Do they have the expertise necessary to narrow quickly on search results, or will they require
           continuous iteration, using one set of results to inform a series of subsequent queries?
       (d) Will they have the expertise to write queries that conform precisely to the search
           application’s expectation, or will you rely on the search application to analyze and decompose
           their terms and phrases to ensure efficient execution and relevant results?




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                            Page 7
5. Does your content require faceting or a taxonomy in order to support productive navigation

   Faceted search provides an effective way to allow users to refine search results, continually drilling
   and discovery by end-users?

   down until the desired items are found. For example, on an e-commerce site, Solr/Lucene can present
   a list of different brands of a flat-screen television, or let the user navigate into results. Facets can
   span virtually any list of attributes, from sets of terms within a field to dates to numeric ranges and
   the like. In addition to document-driven faceting, some search applications add an external taxonomy
   platform to derive metadata—i.e., to extract what documents are about and append fields that
   support guided navigation through results.
       (a) Do documents contain data or metadata that allow users to narrow results?
       (b) Are there consistent rules of document analysis you can create and apply to derive attributes
           from documents?
       (c) If documents lack native metadata, can you use a third party taxonomy platform to identify
           attributes for faceted navigation?

6. Which advanced search features do you expect to use in order to improve how users can

   Solr/Lucene offers a broad set of powerful query and search tools that can help users quickly choose
   submit queries and choose?

   from available options, either before or after they submit a query. Which of the following features can
   help improve the speed and efficacy of the experience for your end users?
       (a) Autosuggest/as-you-type: The search application prompts the user with possible alternate
           queries implied by a partial or complete search term, as they type in the search box.
       (b) Spellchecking: The search application can interpret search terms that are not necessarily
           spelled correctly, either prompting the user with correctly spelled alternatives, and/or
           automatically retrieving results that match terms that most closely resemble the misspelled
           word in the query.
       (c) Did you mean: Similar to spell checking, the search application can offer alternate matches to
           terms that resemble the user’s query, even when those terms were not typed in explicitly.
       (d) More like this: The search application allows the user to drill down into a particular element
           of one result set to find additional results that resemble it.
       (e) Hit highlighting: The search application can mark or emphasize specific terms from the
           query in snippets of the document result, showing the user which terms match the query.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                               Page 8
III . What Is the Technology Environment
      in Which You Are Building Your Search Application?
Driven by the opportunities and demands of the growing universe of available content and by the
versatility of Solr/Lucene open source search, search is evolving from a standalone facility to an enabling
framework.
Search was once considered a black-box application that ingested
content and delivered results to users opaquely. No more. Today,
                                                                            RECOMMENDED READING:


developers are turning to Solr/Lucene to extend the data access and
management power of their applications into the realm of unstructured
                                                                              Full Text Search


text—documents, articles, product descriptions, case studies, informal
                                                                                Engine vs. RDBMS
                                                                                Marc Krellenstein, CTO and

notes, websites, forums, wikis, inventory data, patient records, e-mail
                                                                                co-founder, Lucid

messages, resumes, patents, legal decisions, tweets, log files, traditional
                                                                                Imagination

relational data stores, and nontraditional data infrastructure: The
                                                                              Scaling Lucene and Solr

examples are endless. Effective retrieval of timely, actionable content in
                                                                                Mark Miller, Lucid

the face of such diversity means treating search as an application
                                                                                Imagination; Apache


development platform or an enabling framework, not an end-unto-itself
                                                                                Lucene and Solr Committer


application.

Like application development effort, the exercise of creating search applications and enabling existing
applications with search must be driven by business considerations. With an understanding of your
business needs in hand from the previous section, we now turn to the constraints and capabilities of the
technology context in which the search application is to be developed and deployed, and exploring key
attributes of your technology environment tied to search application development.


   Solr and Lucene search applications are typically developed as web applications. High-level search
1. What Programming Skills Do Your Developers Bring to Your Search Application?

   functions that can be accessed programmatically include queries, indexing commands, relevance
   algorithms, performance, and the like, generally presented by Solr as services and configuration
   options. Solr offers a particularly broad base of client libraries, which means it can be accessed
   through a large variety of programming languages.
   In which of the following languages/environments supported by Solr is your application development
   team skilled and experienced?
           (a) JSON                                             (f)   Python
           (b) Java                                             (g)   .Net
           (c) Ruby                                             (h)   C#
           (d) PHP                                              (i)   Perl
           (e) Ajax                                             (j)   JavaScript

Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                               Page 9
For most intents and purposes, open-source software has “crossed the chasm” into mainstream
2. Is your development team skilled and experienced in working with Open Source?

   usage, with a broad range of government, nonprofit, and corporate sectors running well-established
   portions of their IT infrastructure on the LAMP stack—Linux, Apache, MySQL, and PHP/Perl/Python.
   A recent survey of 300 large corporations by the global consultancy firm Accenture shows the
   majority of respondents committing strategic technology initiatives to open source. To gauge the
   depth of open source utilization, which of the following major open source projects are broadly
   utilized in your organization?
       (a)   Linux for server operating systems
       (b)   MySQL or Postgres for RDBMS
       (c)   Eclipse for integrated software development
       (d)   PHP for web application integration
       (e)   Apache for http services
       (f)   Tomcat for web application containers
       (g)   JBOSS for application business logic


   Most individuals are acquainted with searching for content stored either in the context of their own
3. How and where are the data and documents stored, independent of format?

   personal computer environments, such as a file system, in e-mail, or in one of the popular,
   advertising-driven consumer-facing commercial Internet search service. In the context of enterprise
   or commercial search, the diversity of data storage methods spans a much broader range of
   technologies, not necessarily tied to formats for individual file objects. Which of the following data
   repositories will your search application access?
       (a)   Traditional directory-oriented file servers, fileshares, and filesystems
       (b)   Web servers
       (c)   Relational databases, including Oracle, MySQL, SQL Server, Informix, Postgres, DB2
       (d)   Nonrelational (AKA NoSQL) data stores, such as Hadoop, Cassandra, Memcached
       (e)   Proprietary collaboration stores e.g., Lotus Notes, Sharepoint
       (f)   Open Source content management systems, e.g., Drupal, Joomla, Alfresco.
       (g)   Proprietary Enterprise content management systems, e.g., Documentum, Vignette, OpenText
       (h)   XML-oriented data stores, such as Mark Logic




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                             Page 10
IT organizations are able to achieve significant setup/deployment economies by standardizing
4. On what operating system platform(s) or environments will your search application run?

   hardware and software practices at the platform level, along with operating practices. Because Solr
   runs in a Java servlet container, with indexes portable across platforms, it can operate in any of the
   mix of mainstream operating system environments, virtualized environments and cloud platforms
   available in today’s marketplace.
       (a)   Linux
       (b)   Solaris
       (c)   Windows/NT Server/.Net framework
       (d)   Mac OS
       (e)   Amazon EC2 (including the above OS environments)
       (f)   VMWare (including the above OS environments)


   Solr and Lucene are complementary technologies that offer very similar underlying capabilities. Solr
5. Should you use Lucene or Solr?

   is the Lucene search server; Lucene is the set of Java libraries that run inside the Solr search server,
   also available independent of the server implementation.
   As the Lucene search server, Solr presents a web service layer built atop Lucene using the Lucene
   search library and extending it to provide application users with a ready-to-use search platform. Solr
   offers search speed, relevancy ranking, complete query capabilities, portability, scalability, low
   overhead indexes, and rapid incremental indexing, from its Lucene core. Its server encapsulation of
   Lucene adds operational and administrative capabilities like web services, faceting, configurable
   schema, caching, replication, and administrative tools for configuration, data loading, statistics,
   logging, cache management, and more.
   Lucene gives Solr its search power. In all but a small number of exceptions, organizations building
   search applications should start with Solr rather than a direct implementation of the Lucene libraries.
   Applications that do otherwise often began their efforts prior to the availability of Solr.
   Solr provides the starting point for most developers who are building a Lucene-based search
   application. Organizations who build with Solr find themselves better able to adapt their application
   to changing data structures, query needs, user behaviors, and infrastructure configuration. These
   benefits accrue in lower “costs of ownership,” improved flexibility, and a broader available pool of
   search application developers in the marketplace.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                              Page 11
6. Are application development practices in your organization structured to address time to

   Successful application development depends on the professional practice of software development.
   market constraints or technical complexity?

   While there are many theories, approaches, and development models, there are a key set of
   development disciplines practiced by successful application development organizations. Does your
   application development team understand the tools and methodologies methods and mechanisms
   involved in the following software development competencies?
       (a)   Requirements analysis
       (b)   Iterative design
       (c)   Documentation
       (d)   Test planning
       (e)   Change control
       (f)   Architectural description
       (g)   Formal design
       (h)   Fuild and release engineering

7. What service level availability does your search application need to deliver to end users? What

   Solr’s ability to run on a distributed infrastructure provides robust application availability and
   is the cost or impact of outages or service unavailability?

   performance at scale, allowing you to expand to meet growth in both your document collection and
   your user workload. As with all infrastructure, it is important to understand in advance what impact a
   service outage would have on your end users, in order to ensure that the system is as strong as its
   weakest link, so that you can make appropriate choices about networking, servers, storage, and
   operating procedures. What is the longest interval during which your end users can be productive
   without access to your search application? And how often can they tolerate such unscheduled
   outages?

             (a) 1 minute                                      (a) Once per year
             Duration                                          Frequency

             (b) 30 minutes                                    (b) Once per month
             (c) 1 hour                                        (c) Once per week
             (d) 4 hours                                       (d) Once per day
             (e) 1 day                                         (e) Once an hour or more
             (f) Longer than 1 day




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                           Page 12
IV. How can you ensure fit between Solr/Lucene
    and your ongoing business needs?
The best test of technology in the enterprise is in its ability to deliver on business needs consistently. It
must strike the optimal balance between features/functions and the continuous achievement of
competitive advantage for the business paying for it. Search is the same, only more so: It must constantly
do a better and better job of delivering results that derive competitive advantage from matching end
users to valuable information in timely fashion.
Open source can be a two-edged sword: Unmatched in its innovation, the timing of its innovation (as is
often the case with innovation in any domain) is not always predictable. While the marketplace
challenges a company faces are constant and dynamic, its technology infrastructure demands a strong
degree of stability and predictability. The design, building, and maintenance of applications must handle
change without adding instability to the problems they aim to solve.
At Lucid Imagination, we specialize in capturing the best that open source Solr/Lucene search offers,
delivering it into business-critical application development efforts in a way that improves stability;
providing predictability without sacrificing the power, scalability, or flexibility of open source. With time-
driven support, deep expertise, and broad solution platform of stable value-added software, we
transform open source search into a stable foundation that lets you accelerate with confidence.
In this section, we’ll present considerations for you in taking advantage of the power of open source in
the context of the enterprise. Unlike previous sections that were shaped by various options, these
questions are designed to help you consider risks and dynamics of your development effort and its ability
to bridge the gap: between the open source innovation you need to compete and the enterprise
foundation you rely on to effectively reap the benefits of that innovation.


   If there is one element all search applications share, it is their diversity: Each set of content, queries,
1. What is your “bench-depth” in designing and deploying search applications?

   and end user requirements is unique. One of the great strengths of open source search is as a robust,
   general purpose platform capturing inputs from a broad variety of search use cases.
    Even when you have top talent, your search application may be limited by their experience; others
    inside or outside the public open source discussion archives might have experience that could benefit
    their efforts.
    For example, the foundations of ambition for your search application are built-in early: Your
    development team must make critical architecture and design decisions, with significant downstream
    impact throughout subsequent releases of your application to customers. Breadth of experience will
    make a critical difference in whether those assumptions will lend themselves to necessary future
    changes, or introduce unnecessary constraints that hobble your application when the time comes to
    seize new opportunities.

Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                 Page 13
2. How does your organization find and incorporate changes to code or source code

   Open source code is the raw material of your application development effort. The less it costs to
   fixes for your applications?

   ensure inbound quality and stability, the more you reduce risks to the application you are building.
   Open source software does not stand still. Even between major releases, the team of committers and
   programmers developing fixes and improvements is constantly adding new ideas and features to
   their project. Some of these changes are available as patches, others are built into trunk and available
   through nightly builds, and they may or may not meet your acceptance criteria.
   Solr/Lucene is no different: Driven by a consensus-leveraged meritocracy, they produce changes that
   may or may not be compatible with your implementation. Identifying which of those to incorporate
   into development and assessing their impact on other elements of the system is a critical success
   factor—and may or may not be obvious at the point in time they become available.


   In building prototypes, you may or may not be able to wait for the community of experts to work on
3. What is the cost-benefit tradeoff of timely fixes and availability of expertise?

   your need or provide advice; once you reach a production, business-critical scenario, you’ll need
   things done on your timetable, not theirs. Or, you may not wish your particular effort to have any
   public exposure at all—in which case you’ll want a communications channel that meets the needs of
   your business in your marketplace.
   Many problems can be solved given enough time and effort. If your design and deployment efforts
   conform to a schedule where speed has value, consider the relative cost-benefits of internal trial-and-
   error vs. predictably delivered expertise available on demand.

4. Does the cost-benefit tradeoff of fix timeliness change once your application moves into a

   Once an application’s user base extends beyond the developers who built it, its owners must be ready
   production environment?

   to deliver consistent, predictable availability, performance, and scalability. Meeting the service needs
   of end users cannot always be done in real time by the person who wrote the software; developers
   move on to other projects or leave the company.
   The heterogeneity of your content collection, particularly as it changes and grows, can introduce new,
   unanticipated sources of anomalies in its performance. Similarly, it is difficult to anticipate the full
   range of user queries and demands on the system, which often leads to the application's inability to
   meet new, previously unaccounted-for requirements. Ensuring timeliness of fixes to accommodate
   these organic changes may well be beyond the reach of your development team or your IT
   organization.
   Last, and not least, ensuring that the release process itself can meet its intended thresholds of
   performance, throughput, and other systemic qualities can benefit from lessons learned by experts
   experienced across a diverse range of deployments.


Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                             Page 14
5. How Will You Ensure a Consistent, Authoritative Base of

   Critical mass of expertise in development is directly correlated with the overall effectiveness and
   Knowledge and Skills for Your Development Team to Work From?

   velocity of your development efforts. The Solr/Lucene open source community provides developers
   with a rich, diverse base of resources to use in bootstrapping their skills, including mailing list
   forums, examples, peer-to-peer resources, and much more. The enterprise developer can swim far
   and wide in the sea of information, learning by wandering among other implementations and other
   discussions.
   At the same time, organizations driven by a development and business timetable need a more
   structured, organized, and directed approach to building a solid, consistent foundation based on
   authoritative sources. Working from a pedagogically oriented set of materials, developers can not
   only acquire a clearer sense of what the technology is and does, but also how best to apply search
   engine technologies to business requirements. Best practices distilled from years of experience of a
   broad base of experts can give your team a quicker start, reduce the setup and execution time, and
   improve how effectively they contend with problems as and when they emerge.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                            Page 15
Summary of Questions

      1.   What business objectives are (or should be) achieved with your search application?
I.         Why do you need a search application?

      2.   What objectives are (or are not) being met with your current search implementation?
      3.   Which improvements in search behavior contribute to improved business results?
      4.   How much control do you need over the results that end users see?
      5.   How much do your end users know about the content they are searching for?

      1. In what formats are the documents and data you will search?
II.      What are the key technical characteristics of your search application?

      2. Document composition: how big are documents?
      3. How much new content do you presently add per unit time?
         How many documents are updated per unit time?
      4. What is the rate of queries you expect from your user population?
      5. Does your content require faceting or a taxonomy in order
         to support productive navigation and discovery by end-users?
      6. Which advanced search features do you expect to use
         in order to improve how users can submit queries and choose?

    1.   What programming skills do your developers bring to your search application?
III .    What is the technology environment in which you are building your search application?

    2.   Is your development team skilled and experienced in working with Open Source?
    3.   How and where are the data and documents stored, independent of format?
    4.   On what operating system platform(s) or operating environments will your search application run?
    5.   Should you use Lucene or Solr?
    6.   Are application development practices in your organization
         structured to address time-to-market constraints or technical complexity?
      7. What service level availability does your search application need
         to deliver to end users? What is the cost or impact of outages or service unavailability?

      1. What is your “bench-depth” in designing and deploying search applications?
IV.      How can you ensure continuous fit between Solr/Lucene and your business needs?

      2. How does your organization find and incorporate changes to code or source code fixes for your applications?
      3. What is the cost-benefit tradeoff of timely fixes and availability of expertise?
      4. Does the cost-benefit tradeoff of fix timeliness change once your application moves into a production
         environment?
      5. How will you ensure a consistent, authoritative base of
         knowledge and skills for your development team to work from?




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                       Page 16
About Lucid Imagination
Lucid Imagination can help you use Solr/Lucene to get the most from your search applications. Lucid
Imagination has the world-class expertise, resources, support, and services needed to cost-effectively
architect, implement, and optimize Solr/Lucene-based solutions. We provide commercial-grade support,
training, and consulting and offer certified, tested versions of Lucene and Solr. Lucid Imagination’s goal is
to serve as a central resource for the entire Lucene community and marketplace, to make enterprise
search application developers more productive. We also provide access to Solr/Lucene experts, well-
organized information, and documentation.
We’ve helped hundreds of companies get the most out of their search infrastructure. Customers include
AT&T, Buy.com, Cisco, Ford, Macy’s, Sears, Shopzilla, The Motley Fool, Verizon, Edmunds.com, GSI
Commerce, Zappos (Amazon), and many other household names. Lucid Imagination is a privately held
venture-funded company. The investors include Granite Ventures, Walden International, In-Q-Tel, and
Shasta Ventures. To learn more please visit http://www.lucidimagination.com or
http://www.lucidimagination.com/solutions.
For more information on what Lucid Imagination can do to help your employees, customers, and partners
get the most out of your e-commerce efforts contact sales@lucidimagination.com or please call
+1.650.353.4057.

Recommended Reading
      Starting a Search Application by Marc Krellenstein
       http://www.lucidimagination.com/developers/whitepapers/starting-search-application
      The Case for Lucene/Solr Real World Open Source Search Applications
       http:/www.lucidimagination.com/solutions/whitepapers/Managers-Guide-to-Real-World-Open-

       Faceted Search with Solr by Yonik Seeley http://www.lucidimagination.com/Community/Hear-
       Source-Search-Applications
   

       Optimizing Findability in Lucene and Solr by Grant Ingersoll
       from-the-Experts/Articles/Faceted-Search-Solr
   
       http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-

       Full Text Search Engine vs. RDBMS by Marc Krellenstein
       Findability-Lucene-and-Solr
   
       http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-

       Scaling Lucene and Solr by Mark Miller http://www.lucidimagination.com/Community/Hear-
       Solr
   
       from-the-Experts/Articles/Scaling-Lucene-and-Solr



Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                               Page 17
Appendix: Solr/Lucene Features and Benefits
Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In
choosing a search solution that is best suited for your requirements, key factors to consider are
application scope, development environment, and software development preferences.
Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query
capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing.
Solr is the Lucene search server. It presents a web service layer built atop Lucene using the Lucene search
library and extending it to provide application users with a ready-to-use search platform. Solr brings with
it operational and administrative capabilities like web services, faceting, configurable schema, caching,
replication, and administrative tools for configuration, data loading, statistics, logging, cache
management, and more.
Lucene presents a collection of directly callable Java libraries and requires coding and solid information
retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise-ready search
platform, eliminating the need for extensive programming.
Solr provides the starting point for most developers who are building a Lucene-based search application.
It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to scale in a
production Java environment.
With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based
configuration files, Solr can greatly accelerate application development and maintenance. In fact, Lucene
programmers have often reported that they find Solr contains “the same features I was going to build
myself as a framework for Lucene, but already very well implemented.” Using Solr, enterprises can
customize the search application according to their requirements, without involving the cost and risk of
writing the code from the scratch.
Lucene provides greater control of your source code and works best in development environments
where resources need to be controlled exclusively by Java API calls. It works best when constructing and
embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a
native Java application. While working with Lucene, programmers can directly control the large set of
sophisticated features with low-level access, data, or state manipulation.
Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it
provides ease of use and scalable search power out of the box.
As functional siblings, Lucene and Solr have become popular alternatives for search applications; the two
differ mainly in the style of application development used. Key benefits of search with Solr/Lucene
include:
    Search Quality: Speed, Relevance, and Precision Solr/Lucene provides near-real-time search and
    strong relevance ranking to deliver contextually relevant and accurate results very quickly. Tailor-



Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                                 Page 18
made coding for relevancy ranking and sophisticated search capabilities like faceted search help
    users in sorting, organizing, classifying, and structuring retrieved information to ensure that search
    delivers desired results. Search with Solr/Lucene also provides proximity operators, wildcards,
    fielded searching, term/field/document weights, find-similar functions, spell checking, multilingual
    search, and much more.
    Lower Cost and Greater Flexibility, Plug and Play Architecture Solr/Lucene reduces recurring
    and nonrecurring costs, lowering your TCO. As open source software, it does not require purchase of


    a license and is freely available for use. The open source code can be used as is, modified, customized,
    and updated as appropriate to your needs. Solr is easily embedded in your enterprise’s existing
    infrastructure, reducing costs of installation, configuration, and management.
    Open Source Platform for Portability and Easy Deployment Because Solr/Lucene is an open-
    source software solution, it is based on open standards and community-driven development


    processes. It is highly portable and can run on any platform that supports Java. For instance, you can
    build an index on Linux and copy it to a Microsoft Windows machine and search there. This
    unsurpassed portability enables you to keep your search application and your company’s evolving
    infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including C#,
    C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily deployed on
    a single server as well as on distributed, multiserver systems.
    Largest Installed Base of Applications, Increasing Customer Base Solr/Lucene is the most widely
    used open source search system and is installed in around 4,000 organizations worldwide. Publicly


    visible search sites that use Solr/Lucene include CNET, LinkedIn, Monster, Digg, Zappos, MySpace,
    Netflix, and Wikipedia. Solr/Lucene is also in use at Apple, HP, IBM, Iron Mountain, and Los Alamos
    National Laboratories.
    Large Developer Base and Adaptability As community developed software, Solr/Lucene provides
    transparent development and easy access to updates and releases. Developers can work with open


    source code and customize the software according to business-specific needs and objectives. Its open
    source paradigm lets Solr/Lucene provide developers with the freedom and flexibility to evolve the
    software with changing requirements, liberating them from the constraints of commercial vendors.

    Lucid Imagination provides the expertise, resources, and services needed to help enterprises deploy
   Commercial-Grade Support for Mission Critical Search Applications from Lucid Imagination

    and develop Lucene-based search solutions efficiently and cost-effectively. Lucid helps enterprises
    achieve optimal search performance and accuracy with its broad range of expertise, which includes
    indexing and metadata management, content analysis, business rule application, and natural
    language processing. Lucid Imagination also offers certified distributions of Lucene and Solr,
    commercial-grade SLA-based support, training, high-level consulting, and value-added software
    extensions to enable customers to create powerful and successful search applications.




Lucene/Solr Open Source Search Readiness Checklist
A Lucid Imagination White Paper • September 2010                                              Page 19

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
SharePoint 2013 search improvements
SharePoint 2013 search improvementsSharePoint 2013 search improvements
SharePoint 2013 search improvementsKunaal Kapoor
 
Accelerating Open and Private Data Development
Accelerating Open and Private Data DevelopmentAccelerating Open and Private Data Development
Accelerating Open and Private Data DevelopmentKallex
 
SplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunk
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunk
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?therealgaston
 
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceImproving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceLucidworks
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceLucidworks
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineGrant Ingersoll
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Splunk Ninjas: New features, pivot, and search dojo
Splunk Ninjas: New features, pivot, and search dojoSplunk Ninjas: New features, pivot, and search dojo
Splunk Ninjas: New features, pivot, and search dojoSplunk
 
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...Jim Czuprynski
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...lucenerevolution
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)ICF CIRCUIT
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer OverviewGridlogics
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 

Was ist angesagt? (20)

Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
SharePoint 2013 search improvements
SharePoint 2013 search improvementsSharePoint 2013 search improvements
SharePoint 2013 search improvements
 
Accelerating Open and Private Data Development
Accelerating Open and Private Data DevelopmentAccelerating Open and Private Data Development
Accelerating Open and Private Data Development
 
SplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with Splunk
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
SplunkLive! Beginner Session
SplunkLive! Beginner SessionSplunkLive! Beginner Session
SplunkLive! Beginner Session
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?
 
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceImproving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Splunk Ninjas: New features, pivot, and search dojo
Splunk Ninjas: New features, pivot, and search dojoSplunk Ninjas: New features, pivot, and search dojo
Splunk Ninjas: New features, pivot, and search dojo
 
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
Keep Your Code Low, Low, Low, Low, Low: Getting to Digitally Driven With Orac...
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 

Andere mochten auch

Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpLucidworks (Archived)
 
Tennis
TennisTennis
Tennisaritz
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Lucidworks (Archived)
 
Oslb office365
Oslb office365Oslb office365
Oslb office365彰 村地
 
Updated: Marketing your Technology
Updated: Marketing your TechnologyUpdated: Marketing your Technology
Updated: Marketing your TechnologyMarty Kaszubowski
 
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Marty Kaszubowski
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovationsLucidworks (Archived)
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucidworks (Archived)
 
A haiti
A haitiA haiti
A haititanica
 
Azure と世間様
Azure と世間様Azure と世間様
Azure と世間様彰 村地
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Tennis
TennisTennis
Tennisaritz
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2彰 村地
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarksontanica
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentationMarty Kaszubowski
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombsstanica
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 

Andere mochten auch (20)

Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it Up
 
Tennis
TennisTennis
Tennis
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
 
Oslb office365
Oslb office365Oslb office365
Oslb office365
 
Updated: Marketing your Technology
Updated: Marketing your TechnologyUpdated: Marketing your Technology
Updated: Marketing your Technology
 
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
 
A haiti
A haitiA haiti
A haiti
 
Azure と世間様
Azure と世間様Azure と世間様
Azure と世間様
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
Tennis
TennisTennis
Tennis
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarkson
 
Maroon5
Maroon5Maroon5
Maroon5
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentation
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombss
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
What’s new in apache solr 1.4
What’s new in apache solr 1.4What’s new in apache solr 1.4
What’s new in apache solr 1.4
 

Ähnlich wie Checklist for Moving to Solr/Lucene Open Source Search

What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchLucidworks (Archived)
 
3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & AnalyticsThe Digital Group
 
Establishing an Open Source Program Office
Establishing an Open Source Program OfficeEstablishing an Open Source Program Office
Establishing an Open Source Program OfficeLee Calcote
 
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPS
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPSLeading Your Firm To Success With SharePoint & Office 365 - ILTASPS
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPSRichard Harbridge
 
How Big Data can drive innovative technologies and new approaches in large or...
How Big Data can drive innovative technologies and new approaches in large or...How Big Data can drive innovative technologies and new approaches in large or...
How Big Data can drive innovative technologies and new approaches in large or...Nick Brown
 
apache solr web development.pdf
apache solr web development.pdfapache solr web development.pdf
apache solr web development.pdfTasnim Jahan
 
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16 Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16 Kanwal Khipple
 
Intranets In The Cloud: What You Need To Know
Intranets In The Cloud: What You Need To KnowIntranets In The Cloud: What You Need To Know
Intranets In The Cloud: What You Need To KnowRichard Harbridge
 
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...Attivio
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid searchIntranätverk
 
Interactive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInteractive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInexture Solutions
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...lucenerevolution
 
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...Lucas Jellema
 
Intranets in the Cloud: What you need to know #spsmontreal
Intranets in the Cloud: What you need to know #spsmontrealIntranets in the Cloud: What you need to know #spsmontreal
Intranets in the Cloud: What you need to know #spsmontrealKanwal Khipple
 
fusion-apps-new-standard-bus-wp-505097
fusion-apps-new-standard-bus-wp-505097fusion-apps-new-standard-bus-wp-505097
fusion-apps-new-standard-bus-wp-505097Carina Kordan
 

Ähnlich wie Checklist for Moving to Solr/Lucene Open Source Search (20)

Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
 
OSTS_White_Paper
OSTS_White_PaperOSTS_White_Paper
OSTS_White_Paper
 
3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics3RDi Platform for Enterprise Search, Discovery & Analytics
3RDi Platform for Enterprise Search, Discovery & Analytics
 
Establishing an Open Source Program Office
Establishing an Open Source Program OfficeEstablishing an Open Source Program Office
Establishing an Open Source Program Office
 
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPS
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPSLeading Your Firm To Success With SharePoint & Office 365 - ILTASPS
Leading Your Firm To Success With SharePoint & Office 365 - ILTASPS
 
How Big Data can drive innovative technologies and new approaches in large or...
How Big Data can drive innovative technologies and new approaches in large or...How Big Data can drive innovative technologies and new approaches in large or...
How Big Data can drive innovative technologies and new approaches in large or...
 
apache solr web development.pdf
apache solr web development.pdfapache solr web development.pdf
apache solr web development.pdf
 
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16 Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16
Intranets in the Cloud: What You Need to Know at Unity Connect Online #UCO16
 
Intranets In The Cloud: What You Need To Know
Intranets In The Cloud: What You Need To KnowIntranets In The Cloud: What You Need To Know
Intranets In The Cloud: What You Need To Know
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid search
 
Interactive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInteractive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and Elasticsearch
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
 
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...
UX-plosive stuff - user experience to come first (ADF Enterprise Mobility Con...
 
Intranets in the Cloud: What you need to know #spsmontreal
Intranets in the Cloud: What you need to know #spsmontrealIntranets in the Cloud: What you need to know #spsmontreal
Intranets in the Cloud: What you need to know #spsmontreal
 
fusion-apps-new-standard-bus-wp-505097
fusion-apps-new-standard-bus-wp-505097fusion-apps-new-standard-bus-wp-505097
fusion-apps-new-standard-bus-wp-505097
 
Transform unstructured e&p information
Transform unstructured e&p informationTransform unstructured e&p information
Transform unstructured e&p information
 

Mehr von Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 

Kürzlich hochgeladen

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Checklist for Moving to Solr/Lucene Open Source Search

  • 1. Search Readiness Checklist: Moving to Solr/Lucene Open Source Search A Lucid Imagination White Paper
  • 2. Abstract Search was once considered a black-box application that ingested content and delivered results to users opaquely. However, driven by the opportunities and demands of the growing universe of content and by the versatility of Solr/Lucene open source search technology, search applications are evolving from a standalone facility to an enabling framework. Good search is hard. While the basics of search technology can be deceptively simple, the art and science of applying that technology to relevant business and content processing problems is daunting. By its very nature, search can span an almost infinite variety of content, formats, subject matter, relevancy criteria, and more. This Open Source Search Readiness Checklist is organized into four broad categories: Why do you need a search application? What are the key technical characteristics of your search application?  What is your search application’s technology environment?  How can you ensure the best fit between Solr/Lucene and your ongoing business needs?  Each category details key issues to consider in moving to open source search. Whether you are  undertaking a new search application or have a working search application running on a platform you are considering leaving behind, this checklist provides a working foundation to help you make the transition smoothly. Working with Lucid Imagination, the commercial company for Solr/Lucene open source search technology, offers you packaged solutions that simplify and streamline search application development; lower the cost of growth through flexible, adaptable architecture; and deliver reliable backing of unmatched expertise in enterprise search and open source. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page i
  • 3. Contents Introduction ........................................................................................................................................................................................... 1 I. Why Do You Need a Search Application?........................................................................................................................... 2 II. What Are the Key Technical Characteristics of Your Search Application? .......................................................... 5 III . What Is the Technology Environment in Which You Are Building Your Search Application? ...................... 9 IV. How can you ensure fit between Solr/Lucene and your ongoing business needs? ........................................ 13 Summary of Questions...................................................................................................................................................................... 16 About Lucid Imagination ................................................................................................................................................................. 17 Recommended Reading ................................................................................................................................................................... 17 Appendix: Solr/Lucene Features and Benefits ........................................................................................................................ 18 Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page ii
  • 4. Introduction Whether you are undertaking a new search application or have a working search application running on a platform you are considering leaving behind, there are a lot of questions you’ll need to answer to be prepared for the effort. Good search is hard. While the basics of search technology can be deceptively simple, the art and science of applying that technology to relevant business and content processing problems are daunting. By its very nature, search can span an almost infinite variety of content, formats, subject matter, relevancy criteria, and more. Add in the fact that there are almost as many ways to judge relevant results as there are individual end users, and you can see the challenge. This Open Source Readiness Checklist is organized into four broad categories, each with a discussion of the issues and opportunities you’ll need to consider as you prepare for your search application. Where applicable, we’ll provide additional references for further study or research. Why do you need a search application? What are the key technical characteristics of your search application?  What is your search application’s technology environment?  How can you ensure the best fit between Solr/Lucene and your ongoing business needs?  This guide is not intended to replace a design strategy, architectural rigor, or a formal requirements  document. By considering answers for the issues it sets forth, we believe you’ll be better prepared for getting your Solr/Lucene application up and running. If you are replacing a legacy commercial platform, you may wonder: Can Solr/Lucene be a complete search platform if you can’t just “drop it in” and replace what you now have, function-for-function, feature for feature? Consider first that, owing to the great variation of search problems, search technology providers have historically taken different approaches to developing their own toolkit: An effort to imitate one with the other will not cut it. We believe you will be best served by a fresh look at the problem search was meant to solve, unburdened by the details of prior implementations. More importantly, the flexibility and adaptive nature of Solr/Lucene open source will both enable immediate transition and lay the foundation for evolving your application to meet emerging needs. The key measure of readiness for the transition is a solid grip on the value of the effort. Lucid Imagination’s customers report that Solr/Lucene technology delivers tremendous benefits in flexibility, result quality, performance—and most importantly, an ability to control their business and technology destiny with search. Those same customers use Lucid Imagination’s services and solutions to lock in those gains, and cement the competitive advantage achieved with Solr/Lucene. We believe an understanding of these advantages will lead you to apply Solr/Lucene most effectively, and identify where it is that Lucid Imagination can help you design, develop, and deploy your search application with confidence. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 1
  • 5. In understanding the motivation behind your search application, consider how best to align three factors: I. Why Do You Need a Search Application? your users, your data, and your business objectives. When you build a search application, you face end users with expectations driven by their experience with the large consumer search engines on the public Internet, such as Google, Bing, and Yahoo. Certainly, the billions of dollars spent on billions of end users searching trillions of documents have delivered broad-ranging innovations. It’s a fundamentally different proposition to build your own search application. Internet searches may produce millions of results in milliseconds, but they rely on measures like website popularity or on URLs and domain names—not generally applicable to purpose-built applications for businesses. Relying on generalized relevancy for a global population of all Internet users, the big Internet search engines are not tied to your business rules, business process logic, or the opportunity cost of improved precision for your specific set of data or your search users—and their business interests are not yours. Retrieval of unstructured, heterogeneous documents and data is where Lucene/Solr search technology excels. Much of that data has been stored in a relational database, which offer robust storage and stability, RECOMMENDED READING: but its query and retrieval model is ill-suited to the more varied, dynamic modern data landscape.  Starting a Search Application Solr/Lucene search technology offers extraordinarily Marc Krellenstein, CTO and broad applicability, flexibility, scalability, and adaptability. Open source Founder, Lucid Imagination provenance contributes directly to those benefits in many ways. It  The Case for Lucene/Solr: provides a broad community of professional developers, testing and Real World Open Source perfecting the technology against tremendous variation in use cases, as Search Applications well as changes and improvements that are strictly peer-reviewed, A Lucid Imagination White creating a broad foundation of innovation and rigorous peer review. Paper Not to mention faceting, geo-search, numeric range queries, speed and scalability into the billions of documents, near-real-time indexing, and many more innovations that have broken barriers to building effective search applications. Another great capability inherent in the Solr/Lucene platform is anticipating the future needs of the broad range of users. With adaptive and editorial boosting relevancy techniques, query corrections and suggestions, recommended results, and faceted search, search applications built with Solr/Lucene help your business control the quality of experience between your users and your data—and fit that experience to your business objectives. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 2
  • 6. Free software, such as Lucene and Solr open source search, does not mean search is free of effort. If 1. What business objectives are (or should be) achieved with your search application? your search project is successful, consider how you will prove it: Which of these would you be able to point to? (a) Save money? How much or how much more? (b) Save time? How much or how much more? (c) Increase revenue? How much or how much more? (d) Increase end user satisfaction? Which ones? (e) Create advantage over competitors? (f) Decrease risk? How much or how much more? (g) More than one of the above? Most organizations have a system for finding information, often a legacy commercial search system. 2. What objectives are (or are not) being met with your current search implementation? Why is it unsatisfactory? If you were to replace or improve it, which of the results in the previous question would it affect? By how much? Which of the following properties of your search application (one or more) would have the most 3. Which improvements in search behavior contribute to improved business results? impact on the business results you are looking for? (a) Speed with which new content is available. (b) Likelihood the user’s chosen result is in the top n results returned. (c) Completeness of the full set of results the system delivers. (d) Speed with which queries deliver result sets. (e) Flexibility with which the system handles different types of queries. (f) Ability of the system to never deliver “zero” results. (g) Ranking of particular results for particular queries. (h) Reduced effort required for users to find previously unknown content. (i) Likelihood the user will return to use the search system again and again. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 3
  • 7. Within the realm of search behaviors, special attention needs to be paid to the control of search 4. How much control do you need over the results that end users see? results. Often, the application of algorithms, business rules, and access rights tie directly to the economic benefits of search. Solr/Lucene offer great depth in this dimension. The previous question asked about general changes in search behavior; here, consider specifically how important direct control of results is to the success of the application. (a) Do you need to adjust the likelihood that particular results or documents appear at a certain time, or in relationship to other results? (b) Are there certain documents or data that should be delivered to certain users, but proscribed from others? (c) Are there algorithms that you need the system to account for programmatically, in automated fashion during the course of search, such as performing probability calculations? (d) How important is it that you understand why the search returned a particular set of results, and be able to adjust the search behavior as a result? The behavior of your search application will be judged by its end users; how much do you know 5. How much do your end users know about the content they are searching for? about those users and the queries they are likely to submit? Consider the following contrasts. Are your end users likely to: (a) Express their queries in terms or phrases that will narrow in on results quickly, or submit broad, general words that retrieve broad results? (b) Spell the terms they are searching for correctly? (c) Search for known results in an unknown location (e.g., “Find the e-mail I sent to Carol on Tuesday, August 10” )? Or undertake a search without knowing which content they might find? (d) Browse through interim sets of results in order to narrow or refine their search queries? (e) Specify quantitative parameters, such as distances, prices, locations, or dates, as part of their search? (f) Use logic-oriented language (e.g., Boolean queries or wildcard characters) or natural language? Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 4
  • 8. II. What Are the Key Technical Characteristics of Your Search Application? Given the flexibility and broad applicability of Solr/Lucene open source search technology, there is a rich set of design decisions to be made in setting up the application to meet your business objectives within the scope of your technology. In this section, we’ll explore some of the key inputs you’ll need to consider before you begin the exercise of architecture and design of your search application. In most, if not all, of the permutations of search needs implied by the questions below, the flexibility of Solr/Lucene search can address your needs. It’s important to note that these questions are not intended to replace a formal design process or substitute for rigorous architectural assessment of how you can use Solr/Lucene to build a successful search application. Rather, it will help establish your intent with respect to key functional and system behaviors. More than in the previous sections, you may find that the answers to the scoping questions below change over time. As you familiarize RECOMMENDED READING: yourself with the capabilities and possibilities available with the Solr/Lucene search platform, you may well want to refine or revise  Faceted Search with Solr your understanding of what constitutes desired behavior. Yonik Seeley, creator of Apache Solr and co-founder Often, organizations build a working prototype of their search of Lucid Imagination application in order to validate the assumptions, as well as the design  Optimizing Findability in and implementation of the system intended to put those assumptions Lucene and Solr into action. While there are many nuances to formal development Grant Ingersoll, Chair, methodologies that exploit this discover-by-doing effect, they share a Apache Lucene PMC and co- founder of Lucid common pattern of implementation, iteration, learning, improvement, Imagination and change. It is strongly recommended that you consider at least two sets of answers to the questions below; first for a prototype implementation, and perhaps one or more revisions of that implementation going forward, once you accumulate experience and discover the full range of possibilities. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 5
  • 9. Much as documents and data can live in different repositories, they come packaged in different 1. In hat formats are the documents and data you will search? formats, based on where they originated and who created them. A good understanding of these formats enables successful content processing for search. Different format types require different levels of interpretation and composition to separate out searchable text content and metadata (information about the document or its content), which can inform a search, from visual presentation details such as colors, fonts, or software-specific content. For each of the formats, there are further considerations of version; to cite just one example, the formatting and file structure of Microsoft Word 97 *.doc documents differs from the Office 2007 *.docx version. Solr/Lucene can leverage a range of tools—built-in as well as extensions, including both open source and commercial source. Which of the following document format types will you be indexing and searching? (a) XML documents (b) Database records (c) HTML documents (d) Microsoft office documents: *.doc or *.docx for Word; *.ppt or *.pptx for Powerpoint; *.xls or *.xlsx for Excel (e) PDF documents (f) CSV (comma separated values) or TSVs (tab separated values) (g) Open Office documents (h) Engineering drawings from CAD/CAM/CAE systems (i) Others Configuring your search system requires an understanding of your document sizes, as performance 2. Document collection composition: how big are documents? and throughput depend heavily on accounting for the size of documents to be indexed. What percentage or fraction of your documents are: (a) Under 1 KB (f) 5 MB to 10 MB (b) 1 KB to 100 KB (g) 10 MB to 50 MB (c) 100 KB to 500 KB (h) 50 MB to 100 MB (d) 500 KB to 1 MB (i) 100 KB to 250 MB (e) 1 MB to 5 MB (j) 250 MB and up Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 6
  • 10. 3. Howmuch new content do you presently add per unit time? The quality of your search results can be affected by the interval between when a document is How many documents are updated per unit time? complete or ready, and when it appears in the index for searching. (a) Millions of very small documents—in the form of tweets, comments, messages, log files, etc.— appear continuously as users or systems create these content snippets. (b) Existing documents are revised, either by users, or by machines—in the latter case, examples such as reports and data output indexed by your search application. (c) New documents are available less frequently, perhaps even on a regular schedule, which in turn drives user expectations of when they can be searched. (d) Changes to content come in particular windows, busier at some times than others. Consider the question of change to your collection in two ways: First, at what interval does the amount of content in your collection change? Second, what fraction of the total documents are you adding to the overall collection within each interval? (a) From minute to minute (e) Daily (b) About to four times per hour (f) Weekly (c) No more than two per hour (g) Monthly (d) No more once every 4 hours Consider the population of users who drive your search application. How many are they, and what 4. What is the rate of queries you expect from your user population? number of queries might be submitted? Consider especially that queries in the search application do not always map one-to-one with a single string entered by a user in a search box. Use these questions to characterize how many queries your search application will need to handle per unit time, typically in queries per second. (a) How often do they need access to the application? (b) Will they submit queries one at a time on an occasional or ad-hoc basis, or will they rely on the search application for continuous constant use? (c) Do they have the expertise necessary to narrow quickly on search results, or will they require continuous iteration, using one set of results to inform a series of subsequent queries? (d) Will they have the expertise to write queries that conform precisely to the search application’s expectation, or will you rely on the search application to analyze and decompose their terms and phrases to ensure efficient execution and relevant results? Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 7
  • 11. 5. Does your content require faceting or a taxonomy in order to support productive navigation Faceted search provides an effective way to allow users to refine search results, continually drilling and discovery by end-users? down until the desired items are found. For example, on an e-commerce site, Solr/Lucene can present a list of different brands of a flat-screen television, or let the user navigate into results. Facets can span virtually any list of attributes, from sets of terms within a field to dates to numeric ranges and the like. In addition to document-driven faceting, some search applications add an external taxonomy platform to derive metadata—i.e., to extract what documents are about and append fields that support guided navigation through results. (a) Do documents contain data or metadata that allow users to narrow results? (b) Are there consistent rules of document analysis you can create and apply to derive attributes from documents? (c) If documents lack native metadata, can you use a third party taxonomy platform to identify attributes for faceted navigation? 6. Which advanced search features do you expect to use in order to improve how users can Solr/Lucene offers a broad set of powerful query and search tools that can help users quickly choose submit queries and choose? from available options, either before or after they submit a query. Which of the following features can help improve the speed and efficacy of the experience for your end users? (a) Autosuggest/as-you-type: The search application prompts the user with possible alternate queries implied by a partial or complete search term, as they type in the search box. (b) Spellchecking: The search application can interpret search terms that are not necessarily spelled correctly, either prompting the user with correctly spelled alternatives, and/or automatically retrieving results that match terms that most closely resemble the misspelled word in the query. (c) Did you mean: Similar to spell checking, the search application can offer alternate matches to terms that resemble the user’s query, even when those terms were not typed in explicitly. (d) More like this: The search application allows the user to drill down into a particular element of one result set to find additional results that resemble it. (e) Hit highlighting: The search application can mark or emphasize specific terms from the query in snippets of the document result, showing the user which terms match the query. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 8
  • 12. III . What Is the Technology Environment in Which You Are Building Your Search Application? Driven by the opportunities and demands of the growing universe of available content and by the versatility of Solr/Lucene open source search, search is evolving from a standalone facility to an enabling framework. Search was once considered a black-box application that ingested content and delivered results to users opaquely. No more. Today, RECOMMENDED READING: developers are turning to Solr/Lucene to extend the data access and management power of their applications into the realm of unstructured  Full Text Search text—documents, articles, product descriptions, case studies, informal Engine vs. RDBMS Marc Krellenstein, CTO and notes, websites, forums, wikis, inventory data, patient records, e-mail co-founder, Lucid messages, resumes, patents, legal decisions, tweets, log files, traditional Imagination relational data stores, and nontraditional data infrastructure: The  Scaling Lucene and Solr examples are endless. Effective retrieval of timely, actionable content in Mark Miller, Lucid the face of such diversity means treating search as an application Imagination; Apache development platform or an enabling framework, not an end-unto-itself Lucene and Solr Committer application. Like application development effort, the exercise of creating search applications and enabling existing applications with search must be driven by business considerations. With an understanding of your business needs in hand from the previous section, we now turn to the constraints and capabilities of the technology context in which the search application is to be developed and deployed, and exploring key attributes of your technology environment tied to search application development. Solr and Lucene search applications are typically developed as web applications. High-level search 1. What Programming Skills Do Your Developers Bring to Your Search Application? functions that can be accessed programmatically include queries, indexing commands, relevance algorithms, performance, and the like, generally presented by Solr as services and configuration options. Solr offers a particularly broad base of client libraries, which means it can be accessed through a large variety of programming languages. In which of the following languages/environments supported by Solr is your application development team skilled and experienced? (a) JSON (f) Python (b) Java (g) .Net (c) Ruby (h) C# (d) PHP (i) Perl (e) Ajax (j) JavaScript Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 9
  • 13. For most intents and purposes, open-source software has “crossed the chasm” into mainstream 2. Is your development team skilled and experienced in working with Open Source? usage, with a broad range of government, nonprofit, and corporate sectors running well-established portions of their IT infrastructure on the LAMP stack—Linux, Apache, MySQL, and PHP/Perl/Python. A recent survey of 300 large corporations by the global consultancy firm Accenture shows the majority of respondents committing strategic technology initiatives to open source. To gauge the depth of open source utilization, which of the following major open source projects are broadly utilized in your organization? (a) Linux for server operating systems (b) MySQL or Postgres for RDBMS (c) Eclipse for integrated software development (d) PHP for web application integration (e) Apache for http services (f) Tomcat for web application containers (g) JBOSS for application business logic Most individuals are acquainted with searching for content stored either in the context of their own 3. How and where are the data and documents stored, independent of format? personal computer environments, such as a file system, in e-mail, or in one of the popular, advertising-driven consumer-facing commercial Internet search service. In the context of enterprise or commercial search, the diversity of data storage methods spans a much broader range of technologies, not necessarily tied to formats for individual file objects. Which of the following data repositories will your search application access? (a) Traditional directory-oriented file servers, fileshares, and filesystems (b) Web servers (c) Relational databases, including Oracle, MySQL, SQL Server, Informix, Postgres, DB2 (d) Nonrelational (AKA NoSQL) data stores, such as Hadoop, Cassandra, Memcached (e) Proprietary collaboration stores e.g., Lotus Notes, Sharepoint (f) Open Source content management systems, e.g., Drupal, Joomla, Alfresco. (g) Proprietary Enterprise content management systems, e.g., Documentum, Vignette, OpenText (h) XML-oriented data stores, such as Mark Logic Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 10
  • 14. IT organizations are able to achieve significant setup/deployment economies by standardizing 4. On what operating system platform(s) or environments will your search application run? hardware and software practices at the platform level, along with operating practices. Because Solr runs in a Java servlet container, with indexes portable across platforms, it can operate in any of the mix of mainstream operating system environments, virtualized environments and cloud platforms available in today’s marketplace. (a) Linux (b) Solaris (c) Windows/NT Server/.Net framework (d) Mac OS (e) Amazon EC2 (including the above OS environments) (f) VMWare (including the above OS environments) Solr and Lucene are complementary technologies that offer very similar underlying capabilities. Solr 5. Should you use Lucene or Solr? is the Lucene search server; Lucene is the set of Java libraries that run inside the Solr search server, also available independent of the server implementation. As the Lucene search server, Solr presents a web service layer built atop Lucene using the Lucene search library and extending it to provide application users with a ready-to-use search platform. Solr offers search speed, relevancy ranking, complete query capabilities, portability, scalability, low overhead indexes, and rapid incremental indexing, from its Lucene core. Its server encapsulation of Lucene adds operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more. Lucene gives Solr its search power. In all but a small number of exceptions, organizations building search applications should start with Solr rather than a direct implementation of the Lucene libraries. Applications that do otherwise often began their efforts prior to the availability of Solr. Solr provides the starting point for most developers who are building a Lucene-based search application. Organizations who build with Solr find themselves better able to adapt their application to changing data structures, query needs, user behaviors, and infrastructure configuration. These benefits accrue in lower “costs of ownership,” improved flexibility, and a broader available pool of search application developers in the marketplace. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 11
  • 15. 6. Are application development practices in your organization structured to address time to Successful application development depends on the professional practice of software development. market constraints or technical complexity? While there are many theories, approaches, and development models, there are a key set of development disciplines practiced by successful application development organizations. Does your application development team understand the tools and methodologies methods and mechanisms involved in the following software development competencies? (a) Requirements analysis (b) Iterative design (c) Documentation (d) Test planning (e) Change control (f) Architectural description (g) Formal design (h) Fuild and release engineering 7. What service level availability does your search application need to deliver to end users? What Solr’s ability to run on a distributed infrastructure provides robust application availability and is the cost or impact of outages or service unavailability? performance at scale, allowing you to expand to meet growth in both your document collection and your user workload. As with all infrastructure, it is important to understand in advance what impact a service outage would have on your end users, in order to ensure that the system is as strong as its weakest link, so that you can make appropriate choices about networking, servers, storage, and operating procedures. What is the longest interval during which your end users can be productive without access to your search application? And how often can they tolerate such unscheduled outages? (a) 1 minute (a) Once per year Duration Frequency (b) 30 minutes (b) Once per month (c) 1 hour (c) Once per week (d) 4 hours (d) Once per day (e) 1 day (e) Once an hour or more (f) Longer than 1 day Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 12
  • 16. IV. How can you ensure fit between Solr/Lucene and your ongoing business needs? The best test of technology in the enterprise is in its ability to deliver on business needs consistently. It must strike the optimal balance between features/functions and the continuous achievement of competitive advantage for the business paying for it. Search is the same, only more so: It must constantly do a better and better job of delivering results that derive competitive advantage from matching end users to valuable information in timely fashion. Open source can be a two-edged sword: Unmatched in its innovation, the timing of its innovation (as is often the case with innovation in any domain) is not always predictable. While the marketplace challenges a company faces are constant and dynamic, its technology infrastructure demands a strong degree of stability and predictability. The design, building, and maintenance of applications must handle change without adding instability to the problems they aim to solve. At Lucid Imagination, we specialize in capturing the best that open source Solr/Lucene search offers, delivering it into business-critical application development efforts in a way that improves stability; providing predictability without sacrificing the power, scalability, or flexibility of open source. With time- driven support, deep expertise, and broad solution platform of stable value-added software, we transform open source search into a stable foundation that lets you accelerate with confidence. In this section, we’ll present considerations for you in taking advantage of the power of open source in the context of the enterprise. Unlike previous sections that were shaped by various options, these questions are designed to help you consider risks and dynamics of your development effort and its ability to bridge the gap: between the open source innovation you need to compete and the enterprise foundation you rely on to effectively reap the benefits of that innovation. If there is one element all search applications share, it is their diversity: Each set of content, queries, 1. What is your “bench-depth” in designing and deploying search applications? and end user requirements is unique. One of the great strengths of open source search is as a robust, general purpose platform capturing inputs from a broad variety of search use cases. Even when you have top talent, your search application may be limited by their experience; others inside or outside the public open source discussion archives might have experience that could benefit their efforts. For example, the foundations of ambition for your search application are built-in early: Your development team must make critical architecture and design decisions, with significant downstream impact throughout subsequent releases of your application to customers. Breadth of experience will make a critical difference in whether those assumptions will lend themselves to necessary future changes, or introduce unnecessary constraints that hobble your application when the time comes to seize new opportunities. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 13
  • 17. 2. How does your organization find and incorporate changes to code or source code Open source code is the raw material of your application development effort. The less it costs to fixes for your applications? ensure inbound quality and stability, the more you reduce risks to the application you are building. Open source software does not stand still. Even between major releases, the team of committers and programmers developing fixes and improvements is constantly adding new ideas and features to their project. Some of these changes are available as patches, others are built into trunk and available through nightly builds, and they may or may not meet your acceptance criteria. Solr/Lucene is no different: Driven by a consensus-leveraged meritocracy, they produce changes that may or may not be compatible with your implementation. Identifying which of those to incorporate into development and assessing their impact on other elements of the system is a critical success factor—and may or may not be obvious at the point in time they become available. In building prototypes, you may or may not be able to wait for the community of experts to work on 3. What is the cost-benefit tradeoff of timely fixes and availability of expertise? your need or provide advice; once you reach a production, business-critical scenario, you’ll need things done on your timetable, not theirs. Or, you may not wish your particular effort to have any public exposure at all—in which case you’ll want a communications channel that meets the needs of your business in your marketplace. Many problems can be solved given enough time and effort. If your design and deployment efforts conform to a schedule where speed has value, consider the relative cost-benefits of internal trial-and- error vs. predictably delivered expertise available on demand. 4. Does the cost-benefit tradeoff of fix timeliness change once your application moves into a Once an application’s user base extends beyond the developers who built it, its owners must be ready production environment? to deliver consistent, predictable availability, performance, and scalability. Meeting the service needs of end users cannot always be done in real time by the person who wrote the software; developers move on to other projects or leave the company. The heterogeneity of your content collection, particularly as it changes and grows, can introduce new, unanticipated sources of anomalies in its performance. Similarly, it is difficult to anticipate the full range of user queries and demands on the system, which often leads to the application's inability to meet new, previously unaccounted-for requirements. Ensuring timeliness of fixes to accommodate these organic changes may well be beyond the reach of your development team or your IT organization. Last, and not least, ensuring that the release process itself can meet its intended thresholds of performance, throughput, and other systemic qualities can benefit from lessons learned by experts experienced across a diverse range of deployments. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 14
  • 18. 5. How Will You Ensure a Consistent, Authoritative Base of Critical mass of expertise in development is directly correlated with the overall effectiveness and Knowledge and Skills for Your Development Team to Work From? velocity of your development efforts. The Solr/Lucene open source community provides developers with a rich, diverse base of resources to use in bootstrapping their skills, including mailing list forums, examples, peer-to-peer resources, and much more. The enterprise developer can swim far and wide in the sea of information, learning by wandering among other implementations and other discussions. At the same time, organizations driven by a development and business timetable need a more structured, organized, and directed approach to building a solid, consistent foundation based on authoritative sources. Working from a pedagogically oriented set of materials, developers can not only acquire a clearer sense of what the technology is and does, but also how best to apply search engine technologies to business requirements. Best practices distilled from years of experience of a broad base of experts can give your team a quicker start, reduce the setup and execution time, and improve how effectively they contend with problems as and when they emerge. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 15
  • 19. Summary of Questions 1. What business objectives are (or should be) achieved with your search application? I. Why do you need a search application? 2. What objectives are (or are not) being met with your current search implementation? 3. Which improvements in search behavior contribute to improved business results? 4. How much control do you need over the results that end users see? 5. How much do your end users know about the content they are searching for? 1. In what formats are the documents and data you will search? II. What are the key technical characteristics of your search application? 2. Document composition: how big are documents? 3. How much new content do you presently add per unit time? How many documents are updated per unit time? 4. What is the rate of queries you expect from your user population? 5. Does your content require faceting or a taxonomy in order to support productive navigation and discovery by end-users? 6. Which advanced search features do you expect to use in order to improve how users can submit queries and choose? 1. What programming skills do your developers bring to your search application? III . What is the technology environment in which you are building your search application? 2. Is your development team skilled and experienced in working with Open Source? 3. How and where are the data and documents stored, independent of format? 4. On what operating system platform(s) or operating environments will your search application run? 5. Should you use Lucene or Solr? 6. Are application development practices in your organization structured to address time-to-market constraints or technical complexity? 7. What service level availability does your search application need to deliver to end users? What is the cost or impact of outages or service unavailability? 1. What is your “bench-depth” in designing and deploying search applications? IV. How can you ensure continuous fit between Solr/Lucene and your business needs? 2. How does your organization find and incorporate changes to code or source code fixes for your applications? 3. What is the cost-benefit tradeoff of timely fixes and availability of expertise? 4. Does the cost-benefit tradeoff of fix timeliness change once your application moves into a production environment? 5. How will you ensure a consistent, authoritative base of knowledge and skills for your development team to work from? Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 16
  • 20. About Lucid Imagination Lucid Imagination can help you use Solr/Lucene to get the most from your search applications. Lucid Imagination has the world-class expertise, resources, support, and services needed to cost-effectively architect, implement, and optimize Solr/Lucene-based solutions. We provide commercial-grade support, training, and consulting and offer certified, tested versions of Lucene and Solr. Lucid Imagination’s goal is to serve as a central resource for the entire Lucene community and marketplace, to make enterprise search application developers more productive. We also provide access to Solr/Lucene experts, well- organized information, and documentation. We’ve helped hundreds of companies get the most out of their search infrastructure. Customers include AT&T, Buy.com, Cisco, Ford, Macy’s, Sears, Shopzilla, The Motley Fool, Verizon, Edmunds.com, GSI Commerce, Zappos (Amazon), and many other household names. Lucid Imagination is a privately held venture-funded company. The investors include Granite Ventures, Walden International, In-Q-Tel, and Shasta Ventures. To learn more please visit http://www.lucidimagination.com or http://www.lucidimagination.com/solutions. For more information on what Lucid Imagination can do to help your employees, customers, and partners get the most out of your e-commerce efforts contact sales@lucidimagination.com or please call +1.650.353.4057. Recommended Reading  Starting a Search Application by Marc Krellenstein http://www.lucidimagination.com/developers/whitepapers/starting-search-application  The Case for Lucene/Solr Real World Open Source Search Applications http:/www.lucidimagination.com/solutions/whitepapers/Managers-Guide-to-Real-World-Open- Faceted Search with Solr by Yonik Seeley http://www.lucidimagination.com/Community/Hear- Source-Search-Applications  Optimizing Findability in Lucene and Solr by Grant Ingersoll from-the-Experts/Articles/Faceted-Search-Solr  http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing- Full Text Search Engine vs. RDBMS by Marc Krellenstein Findability-Lucene-and-Solr  http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search- Scaling Lucene and Solr by Mark Miller http://www.lucidimagination.com/Community/Hear- Solr  from-the-Experts/Articles/Scaling-Lucene-and-Solr Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 17
  • 21. Appendix: Solr/Lucene Features and Benefits Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In choosing a search solution that is best suited for your requirements, key factors to consider are application scope, development environment, and software development preferences. Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing. Solr is the Lucene search server. It presents a web service layer built atop Lucene using the Lucene search library and extending it to provide application users with a ready-to-use search platform. Solr brings with it operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more. Lucene presents a collection of directly callable Java libraries and requires coding and solid information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise-ready search platform, eliminating the need for extensive programming. Solr provides the starting point for most developers who are building a Lucene-based search application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to scale in a production Java environment. With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based configuration files, Solr can greatly accelerate application development and maintenance. In fact, Lucene programmers have often reported that they find Solr contains “the same features I was going to build myself as a framework for Lucene, but already very well implemented.” Using Solr, enterprises can customize the search application according to their requirements, without involving the cost and risk of writing the code from the scratch. Lucene provides greater control of your source code and works best in development environments where resources need to be controlled exclusively by Java API calls. It works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java application. While working with Lucene, programmers can directly control the large set of sophisticated features with low-level access, data, or state manipulation. Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it provides ease of use and scalable search power out of the box. As functional siblings, Lucene and Solr have become popular alternatives for search applications; the two differ mainly in the style of application development used. Key benefits of search with Solr/Lucene include: Search Quality: Speed, Relevance, and Precision Solr/Lucene provides near-real-time search and strong relevance ranking to deliver contextually relevant and accurate results very quickly. Tailor-  Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 18
  • 22. made coding for relevancy ranking and sophisticated search capabilities like faceted search help users in sorting, organizing, classifying, and structuring retrieved information to ensure that search delivers desired results. Search with Solr/Lucene also provides proximity operators, wildcards, fielded searching, term/field/document weights, find-similar functions, spell checking, multilingual search, and much more. Lower Cost and Greater Flexibility, Plug and Play Architecture Solr/Lucene reduces recurring and nonrecurring costs, lowering your TCO. As open source software, it does not require purchase of  a license and is freely available for use. The open source code can be used as is, modified, customized, and updated as appropriate to your needs. Solr is easily embedded in your enterprise’s existing infrastructure, reducing costs of installation, configuration, and management. Open Source Platform for Portability and Easy Deployment Because Solr/Lucene is an open- source software solution, it is based on open standards and community-driven development  processes. It is highly portable and can run on any platform that supports Java. For instance, you can build an index on Linux and copy it to a Microsoft Windows machine and search there. This unsurpassed portability enables you to keep your search application and your company’s evolving infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including C#, C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily deployed on a single server as well as on distributed, multiserver systems. Largest Installed Base of Applications, Increasing Customer Base Solr/Lucene is the most widely used open source search system and is installed in around 4,000 organizations worldwide. Publicly  visible search sites that use Solr/Lucene include CNET, LinkedIn, Monster, Digg, Zappos, MySpace, Netflix, and Wikipedia. Solr/Lucene is also in use at Apple, HP, IBM, Iron Mountain, and Los Alamos National Laboratories. Large Developer Base and Adaptability As community developed software, Solr/Lucene provides transparent development and easy access to updates and releases. Developers can work with open  source code and customize the software according to business-specific needs and objectives. Its open source paradigm lets Solr/Lucene provide developers with the freedom and flexibility to evolve the software with changing requirements, liberating them from the constraints of commercial vendors. Lucid Imagination provides the expertise, resources, and services needed to help enterprises deploy  Commercial-Grade Support for Mission Critical Search Applications from Lucid Imagination and develop Lucene-based search solutions efficiently and cost-effectively. Lucid helps enterprises achieve optimal search performance and accuracy with its broad range of expertise, which includes indexing and metadata management, content analysis, business rule application, and natural language processing. Lucid Imagination also offers certified distributions of Lucene and Solr, commercial-grade SLA-based support, training, high-level consulting, and value-added software extensions to enable customers to create powerful and successful search applications. Lucene/Solr Open Source Search Readiness Checklist A Lucid Imagination White Paper • September 2010 Page 19