SlideShare a Scribd company logo
1 of 37
ENTERPRISE  SEARCH an introduction
Web Search Desktop Search Enterprise Search
so what is a Search Engine?
[object Object],[object Object],[object Object]
Any search application has  two major components SEARCH   component  INDEXING   component - of importance to us  developers (read headache) - of importance to the  users
data INDEX  FILES is indexed user sends  search query receives  search results INDEXING   component SEARCH   component
Let’s start with INDEXING
is it easy to search here  . . .
or  here  . . .
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
so what all   needs to be  Indexed and Searched ?
various   FILE FORMATS Text Files HTML PDF MS Word PPT
coming from various   DATA SOURCES Emails CMS File System Database Web Pages
data  ( documents )   INDEX  FILES user sends  search query receives  search results Analyzer fed to text that should be indexed  removing  stop words  such as "a" or "the" converting all text to  lowercase  letters  for case-insensitive searching Stemming (A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". )-   Index Writer tokenized text
Document 1: Coffee isn't my cup of tea.   Document 2:  Chocolate, men, coffee - some things are better rich.   INDEX coffee  - 1,2 cup - 1  tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
And now the SEARCH  Component
data INDEX  FILES is indexed user receives  search results sends  search query search terms
Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
introducing   LUCENE
[object Object],[object Object],[object Object],[object Object]
 
 
Ways of storing fields  of any document: Indexed   means it is   searchable Stored   you may chose not to make a field searchable,  means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized  means it is run through an  Analyzer , that converts the content into a sequence of  tokens
introducing   SOLR Solr Solr Lucene Index
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Adding Documents to SOLR
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schema.xml   field indexing and display definition
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solrconfig.xml  file  defines cache size, faceted field type, request handler customization
Deleting Documents ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price
Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
<response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc>  <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float>  </doc>  <doc>  <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update  Handler Caching XML Update  Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
 

More Related Content

What's hot

Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
201014161
 
Email & internet
Email & internetEmail & internet
Email & internet
smartware
 
Application software
Application softwareApplication software
Application software
shalivale
 

What's hot (20)

Search engine
Search engineSearch engine
Search engine
 
Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its working
 
Type of websites
Type of websitesType of websites
Type of websites
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Email & internet
Email & internetEmail & internet
Email & internet
 
Word Processor
Word Processor Word Processor
Word Processor
 
Presentation it
Presentation itPresentation it
Presentation it
 
Ppt on internet
Ppt on internetPpt on internet
Ppt on internet
 
Types Of Software
Types Of SoftwareTypes Of Software
Types Of Software
 
Application software
Application softwareApplication software
Application software
 
Computer virus
Computer virusComputer virus
Computer virus
 
Application Software
Application SoftwareApplication Software
Application Software
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
Search engines and its types
Search engines and its typesSearch engines and its types
Search engines and its types
 
كل ما تحب معرفته عن محرك البحث قوقل (Google)
كل ما تحب معرفته عن محرك البحث قوقل (Google)كل ما تحب معرفته عن محرك البحث قوقل (Google)
كل ما تحب معرفته عن محرك البحث قوقل (Google)
 
The internet
The internetThe internet
The internet
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
Application Software
Application SoftwareApplication Software
Application Software
 
Computer Security Presentation
Computer Security PresentationComputer Security Presentation
Computer Security Presentation
 
Internet basic
Internet basicInternet basic
Internet basic
 

Viewers also liked (7)

Search Engines
Search EnginesSearch Engines
Search Engines
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!Learn the Search Engine Type and Its Functions!
Learn the Search Engine Type and Its Functions!
 
Tutorial 3 - Searcing the Web
Tutorial 3 - Searcing the WebTutorial 3 - Searcing the Web
Tutorial 3 - Searcing the Web
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
 

Similar to Introduction to Search Engines

Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
longkeyy
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
patinijava
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
askankit
 
E pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverviewE pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverview
wqwqqw wqqww
 

Similar to Introduction to Search Engines (20)

Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
 
Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
Microsoft Enterprise Search Products
Microsoft Enterprise Search ProductsMicrosoft Enterprise Search Products
Microsoft Enterprise Search Products
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
E pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverviewE pi servereasysearchtechnicaloverview
E pi servereasysearchtechnicaloverview
 
Xml
XmlXml
Xml
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
SharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer WorkflowsSharePoint Intelligence Introduction To Share Point Designer Workflows
SharePoint Intelligence Introduction To Share Point Designer Workflows
 
COinS (eng version)
COinS (eng version)COinS (eng version)
COinS (eng version)
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
 
Basics of Xml
Basics of XmlBasics of Xml
Basics of Xml
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Introduction to Search Engines

  • 1. ENTERPRISE SEARCH an introduction
  • 2. Web Search Desktop Search Enterprise Search
  • 3. so what is a Search Engine?
  • 4.
  • 5. Any search application has two major components SEARCH component INDEXING component - of importance to us developers (read headache) - of importance to the users
  • 6. data INDEX FILES is indexed user sends search query receives search results INDEXING component SEARCH component
  • 8. is it easy to search here . . .
  • 9. or here . . .
  • 10.
  • 11.
  • 12. so what all needs to be Indexed and Searched ?
  • 13. various FILE FORMATS Text Files HTML PDF MS Word PPT
  • 14. coming from various DATA SOURCES Emails CMS File System Database Web Pages
  • 15. data ( documents ) INDEX FILES user sends search query receives search results Analyzer fed to text that should be indexed removing stop words such as &quot;a&quot; or &quot;the&quot; converting all text to lowercase letters for case-insensitive searching Stemming (A stemming algorithm reduces the words &quot;fishing&quot;, &quot;fished&quot;, &quot;fish&quot;, and &quot;fisher&quot; to the root word, &quot;fish&quot;. )- Index Writer tokenized text
  • 16. Document 1: Coffee isn't my cup of tea. Document 2: Chocolate, men, coffee - some things are better rich. INDEX coffee - 1,2 cup - 1 tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
  • 17. And now the SEARCH Component
  • 18. data INDEX FILES is indexed user receives search results sends search query search terms
  • 19. Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
  • 20. introducing LUCENE
  • 21.
  • 22.  
  • 23.  
  • 24. Ways of storing fields of any document: Indexed means it is searchable Stored you may chose not to make a field searchable, means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized means it is run through an Analyzer , that converts the content into a sequence of tokens
  • 25. introducing SOLR Solr Solr Lucene Index
  • 26.
  • 28.
  • 29. Schema.xml field indexing and display definition
  • 30.
  • 31. Solrconfig.xml file defines cache size, faceted field type, request handler customization
  • 32.
  • 34. Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
  • 35. <response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc> <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float> </doc> <doc> <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
  • 36. Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
  • 37.