Introduction to Search Engines

ENTERPRISE SEARCH an introduction

Web Search Desktop Search Enterprise Search

[object Object],[object Object],[object Object]

Any search application has two major components SEARCH component INDEXING component - of importance to us developers (read headache) - of importance to the users

data INDEX FILES is indexed user sends search query receives search results INDEXING component SEARCH component

is it easy to search here . . .

[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object]

so what all needs to be Indexed and Searched ?

various FILE FORMATS Text Files HTML PDF MS Word PPT

coming from various DATA SOURCES Emails CMS File System Database Web Pages

data ( documents ) INDEX FILES user sends search query receives search results Analyzer fed to text that should be indexed removing stop words such as "a" or "the" converting all text to lowercase letters for case-insensitive searching Stemming (A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". )- Index Writer tokenized text

Document 1: Coffee isn't my cup of tea. Document 2: Chocolate, men, coffee - some things are better rich. INDEX coffee - 1,2 cup - 1 tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1

data INDEX FILES is indexed user receives search results sends search query search terms

Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page

Ways of storing fields of any document: Indexed means it is searchable Stored you may chose not to make a field searchable, means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized means it is run through an Analyzer , that converts the content into a sequence of tokens

introducing SOLR Solr Solr Lucene Index

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Schema.xml field indexing and display definition

Solrconfig.xml file defines cache size, faceted field type, request handler customization

Deleting Documents ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price

Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search

<response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound="16173" start="0"> <doc> <str name="name">Apple 60 GB iPod with Video</str> <float name="price">399.0</float> </doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV</str> <float name="price">479.95</float> </doc> </result> </response>

Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here

Introduction to Search Engines

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Introduction to Search Engines

Similar to Introduction to Search Engines (20)

Recently uploaded

Recently uploaded (20)

Introduction to Search Engines