SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Using SOLR as Open-Source Search
Platform for Organizational Research
Experts Retrieval
Dr. Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
GKH USM 1
7th Annual Open Source Summit 2020
Day 5 - 15th December
Talk Outline
Text Search with SOLR
What is SOLR
What is Text Search
Text Search in an Organization
Information Retrieval Concept and SOLR in Use
Demo of Organizational Research Expert Retrieval
GKH USM 2
Part I: Text Search with SOLR
GKH USM 3
What is SOLR?
GKH USM 4
SOLR
Solr is the
enterprise search platform built on
Apache Lucene™.
GKH USM 5
popular
blazing-fast
open source
Solr Resources
Official Website: https://lucene.apache.org/solr/
Documentation: https://lucene.apache.org/solr/resources.html
GKH USM 6
What is Text Search?
Are you thinking about Google?
GKH USM 7
Let’s start with something we are familiar
with …
Users want to type in a few simple keywords and get back great results.
Involved matching query terms to documents
GKH USM 8
Ranked Results
Return “ranked” documents for a query.
A search engine returns documents sorted in descending order by a
score that indicates the strength of the match of the document to the
query.
Ranking by relevancy is important.
Side Question: How to determine the relevancy?
GKH USM 9
What Else
Search engine also
Returns Images, Videos, Social Feeds, Products etc.
Provides search suggestions, auto complete etc.
Gives answer, facts, files etc.
GKH USM 10
Text Search Solution in Your Organization
How do you perform the search within your organization?
documents finding tool in your operating system.
querying of data stored in your database using Sql.
Will you implement a (similar to) Google Search Engine for
your organization?
GKH USM 11
Questions for Audience
1. List one TEXT or DOCUMENT
LOOKUP/SEARCH/NAVIGATION PROBLEM that you have
encountered in your organization.
2. Is there any existing search tool or feature implemented
within your organization that you can adapt to address the
problem?
GKH USM 12
Enterprise Search
Managing search solutions within an organization or for the benefit of an
organization …
GKH USM 13
IR and Search Engines
Relevance
-Effective ranking
Evaluation
-Testing and
measuring
Information needs
-User interaction
Performance
-Efficient search and indexing
Incorporating new data
-Coverage and freshness
Scalability
-Growing with data and users
Adaptability
-Tuning for applications
Specific problems
-e.g. Spam
Information
Retrieval
Search
Engines
14
Enterprise
Search
Engines
GKH USM
Enterprise Search Features
1. Not only Texts.
Unifying structured and unstructured data.
2. Not only Search.
Search + Analytics.
3. Not for Everyone.
Not addressing a general problem, but specific to
application/domain/business needs.
GKH USM 15
Part II: IR Concept & SOLR in Use
GKH USM 16
High Level Information Retrieval Concept
GKH USM 17
Documents
Document
Representation
Information Needs
Query
Retrieved
Documents
Indexing Formulation
Retrieval Function
Relevance Feedback
Diagram
of
the
main
components
of
Solr
4
GKH USM 18
Image Source: Solr in
Action, Graiger & Potter
Documents
Indexing
Document
Representation
Query
Retrieval Function
Retrieved
Documents
Glimpse of Solr In Use
GKH USM 19
SOLR Downloads Site
GKH USM 20
Unzipping SOLR into Directory
For Windows users, we highly recommend that you extract Solr to a directory that
doesn’t have spaces in the name; that is, avoid extracting Solr into directories like
C:Documents and Settings or C:Program Files. For example, use path like
c:solr-8.2.0 instead.
For Linux users, choose a location like /opt/solr/.
GKH USM 21
View SOLR in Your Directory
Example directory listing of the solr-8.2.0 installation after extracting the
downloaded archive on your computer. We’ll refer to the top-level
directory as $SOLR_INSTALL/ throughout the rest of the slides.
GKH USM 22
Start Solr
To start Solr, you need to run solr script located at the bin folder.
For example, if your placed solr at c:solr-8.2.0
Open a command line, and enter the following:
$ cd #this gets you to the base directory
$ cd $SOLR_INSTALL #this gets you to your solr folder
$ cd bin #this gets you to your bin folder
$ bin/solr start
Note: cd – change directory
GKH USM 23
Start Solr
GKH USM 24
Admin Console - http://localhost:8983/solr/
GKH USM 25
Create Your First Core
Your running server is empty.
Create your first core, called “techproduct”.
Go to command prompt,
$ bin/solr create –c techproduct
GKH USM 26
Meet Your First Core
GKH USM 27
View Properties of The Core
* You can also Add/Rename Core at the admin page.
GKH USM 28
Add Some Example Documents
When you first start Solr, there are no documents in the index. It’s an empty server
waiting to be filled with data to search.
Let’s add some documents from exampleexampledocs directory.
What are the example file types in your
$ SOLR_INSTALLexampleexampledocs ??
GKH USM 29
Use Post Tool to Add Documents
For Unix user, you can call Post Tool from bin
$ bin/post -c techproduct example/exampledocs/*.xml
For Window user, navigate to the exampleexampledocs folder
$ cd
$ cd $SOLR_INSTALLexampleexampledocs
$ java -jar –Dc=techproduct post.jar *.xml
specify core
the files to be added. In this case, we
are adding all files with .xml type
GKH USM 30
Status of Added Files
GKH USM 31
What is the speed of indexing?
Let’s Search
Go to http://localhost:8983/solr/
Select techproduct core
Select Query tab
Enter *:* at the query form.
GKH USM 32
GKH USM 33
Search results from
executing
the find of all
documents query.
View More Search Results
In the query form,
• Change start to 0
• Change rows to 32
GKH USM 34
14 files were indexed, but the search *:*
found 32 documents.
GKH USM 35
Part III: Demo of Organizational
Research Expert Retrieval
GKH USM 36
Organization Research Experts Retrieval
Target Users: Students, Researchers, Collaborators looking for expertise
from School of Computer Sciences, Universiti Sains Malaysia.
Data Set: Scopus publication data for all academics at the school.
Status: Prototype.
Purpose: In house solution.
Focused search and analytics capabilities.
GKH USM 37
1. Design Document/Retrieval Unit
GKH USM 38
2. Create Solr Core
GKH USM 39
Create a new core to
store the collection
3. Perform Indexing
Perform indexing at the
backend
using addDocuments()
by Solarium PHP library
GKH USM 40
https://github.com/solariumphp
4. Implement Search Front End
GKH USM 41
5. Format the Response into Results Page
GKH USM 42
6. Demo
Visit the prototype at
http://ir.cs.usm.my/exsearch4/
Try search “cryptography”.
GKH USM 43
Thank you
GKH USM 44
Visit our school at cs.usm.my
The work by IR research at ir.cs.usm.my
Drop me an email at khganATusm.my

Weitere ähnliche Inhalte

Ähnlich wie OSS 2020 Using SOLR as Open-Source Search Platform.pdf

Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_enggAnnie Thomas
 
Whats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-campWhats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-camptimohund
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Haveknolidge
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionSplunk
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchLucidworks
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Robin O'Brien
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.inovex GmbH
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 

Ähnlich wie OSS 2020 Using SOLR as Open-Source Search Platform.pdf (20)

Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_engg
 
Whats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-campWhats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-camp
 
ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!
 
Knolidge
KnolidgeKnolidge
Knolidge
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Content search api in sitecore 8.1
Content search api in sitecore 8.1Content search api in sitecore 8.1
Content search api in sitecore 8.1
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Have
 
Wordpress as a framework
Wordpress as a frameworkWordpress as a framework
Wordpress as a framework
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Basic Android
Basic AndroidBasic Android
Basic Android
 

Mehr von Gan Keng Hoon

A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels Gan Keng Hoon
 
Keywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using RKeywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using RGan Keng Hoon
 
Procrastination and Phd.pdf
Procrastination and Phd.pdfProcrastination and Phd.pdf
Procrastination and Phd.pdfGan Keng Hoon
 
Guest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdfGuest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdfGan Keng Hoon
 
Knowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdfKnowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdfGan Keng Hoon
 
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Gan Keng Hoon
 
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...Gan Keng Hoon
 
Text and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business IntelligenceText and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business IntelligenceGan Keng Hoon
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Gan Keng Hoon
 
Semantics in Retrieval
Semantics in Retrieval Semantics in Retrieval
Semantics in Retrieval Gan Keng Hoon
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineGan Keng Hoon
 
Faceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise BibliographiesFaceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise BibliographiesGan Keng Hoon
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchGan Keng Hoon
 
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and PublishingA Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and PublishingGan Keng Hoon
 
Wi 2015 demo_preview
Wi 2015 demo_previewWi 2015 demo_preview
Wi 2015 demo_previewGan Keng Hoon
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 

Mehr von Gan Keng Hoon (17)

A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels
 
Keywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using RKeywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using R
 
Procrastination and Phd.pdf
Procrastination and Phd.pdfProcrastination and Phd.pdf
Procrastination and Phd.pdf
 
Guest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdfGuest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdf
 
Knowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdfKnowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdf
 
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
 
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
 
Text and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business IntelligenceText and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business Intelligence
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...
 
Semantics in Retrieval
Semantics in Retrieval Semantics in Retrieval
Semantics in Retrieval
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
Faceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise BibliographiesFaceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise Bibliographies
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
 
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and PublishingA Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
 
Wi 2015 demo_preview
Wi 2015 demo_previewWi 2015 demo_preview
Wi 2015 demo_preview
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 

Kürzlich hochgeladen

Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 

Kürzlich hochgeladen (20)

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 

OSS 2020 Using SOLR as Open-Source Search Platform.pdf

  • 1. Using SOLR as Open-Source Search Platform for Organizational Research Experts Retrieval Dr. Gan Keng Hoon School of Computer Sciences Universiti Sains Malaysia GKH USM 1 7th Annual Open Source Summit 2020 Day 5 - 15th December
  • 2. Talk Outline Text Search with SOLR What is SOLR What is Text Search Text Search in an Organization Information Retrieval Concept and SOLR in Use Demo of Organizational Research Expert Retrieval GKH USM 2
  • 3. Part I: Text Search with SOLR GKH USM 3
  • 5. SOLR Solr is the enterprise search platform built on Apache Lucene™. GKH USM 5 popular blazing-fast open source
  • 6. Solr Resources Official Website: https://lucene.apache.org/solr/ Documentation: https://lucene.apache.org/solr/resources.html GKH USM 6
  • 7. What is Text Search? Are you thinking about Google? GKH USM 7
  • 8. Let’s start with something we are familiar with … Users want to type in a few simple keywords and get back great results. Involved matching query terms to documents GKH USM 8
  • 9. Ranked Results Return “ranked” documents for a query. A search engine returns documents sorted in descending order by a score that indicates the strength of the match of the document to the query. Ranking by relevancy is important. Side Question: How to determine the relevancy? GKH USM 9
  • 10. What Else Search engine also Returns Images, Videos, Social Feeds, Products etc. Provides search suggestions, auto complete etc. Gives answer, facts, files etc. GKH USM 10
  • 11. Text Search Solution in Your Organization How do you perform the search within your organization? documents finding tool in your operating system. querying of data stored in your database using Sql. Will you implement a (similar to) Google Search Engine for your organization? GKH USM 11
  • 12. Questions for Audience 1. List one TEXT or DOCUMENT LOOKUP/SEARCH/NAVIGATION PROBLEM that you have encountered in your organization. 2. Is there any existing search tool or feature implemented within your organization that you can adapt to address the problem? GKH USM 12
  • 13. Enterprise Search Managing search solutions within an organization or for the benefit of an organization … GKH USM 13
  • 14. IR and Search Engines Relevance -Effective ranking Evaluation -Testing and measuring Information needs -User interaction Performance -Efficient search and indexing Incorporating new data -Coverage and freshness Scalability -Growing with data and users Adaptability -Tuning for applications Specific problems -e.g. Spam Information Retrieval Search Engines 14 Enterprise Search Engines GKH USM
  • 15. Enterprise Search Features 1. Not only Texts. Unifying structured and unstructured data. 2. Not only Search. Search + Analytics. 3. Not for Everyone. Not addressing a general problem, but specific to application/domain/business needs. GKH USM 15
  • 16. Part II: IR Concept & SOLR in Use GKH USM 16
  • 17. High Level Information Retrieval Concept GKH USM 17 Documents Document Representation Information Needs Query Retrieved Documents Indexing Formulation Retrieval Function Relevance Feedback
  • 18. Diagram of the main components of Solr 4 GKH USM 18 Image Source: Solr in Action, Graiger & Potter Documents Indexing Document Representation Query Retrieval Function Retrieved Documents
  • 19. Glimpse of Solr In Use GKH USM 19
  • 21. Unzipping SOLR into Directory For Windows users, we highly recommend that you extract Solr to a directory that doesn’t have spaces in the name; that is, avoid extracting Solr into directories like C:Documents and Settings or C:Program Files. For example, use path like c:solr-8.2.0 instead. For Linux users, choose a location like /opt/solr/. GKH USM 21
  • 22. View SOLR in Your Directory Example directory listing of the solr-8.2.0 installation after extracting the downloaded archive on your computer. We’ll refer to the top-level directory as $SOLR_INSTALL/ throughout the rest of the slides. GKH USM 22
  • 23. Start Solr To start Solr, you need to run solr script located at the bin folder. For example, if your placed solr at c:solr-8.2.0 Open a command line, and enter the following: $ cd #this gets you to the base directory $ cd $SOLR_INSTALL #this gets you to your solr folder $ cd bin #this gets you to your bin folder $ bin/solr start Note: cd – change directory GKH USM 23
  • 25. Admin Console - http://localhost:8983/solr/ GKH USM 25
  • 26. Create Your First Core Your running server is empty. Create your first core, called “techproduct”. Go to command prompt, $ bin/solr create –c techproduct GKH USM 26
  • 27. Meet Your First Core GKH USM 27
  • 28. View Properties of The Core * You can also Add/Rename Core at the admin page. GKH USM 28
  • 29. Add Some Example Documents When you first start Solr, there are no documents in the index. It’s an empty server waiting to be filled with data to search. Let’s add some documents from exampleexampledocs directory. What are the example file types in your $ SOLR_INSTALLexampleexampledocs ?? GKH USM 29
  • 30. Use Post Tool to Add Documents For Unix user, you can call Post Tool from bin $ bin/post -c techproduct example/exampledocs/*.xml For Window user, navigate to the exampleexampledocs folder $ cd $ cd $SOLR_INSTALLexampleexampledocs $ java -jar –Dc=techproduct post.jar *.xml specify core the files to be added. In this case, we are adding all files with .xml type GKH USM 30
  • 31. Status of Added Files GKH USM 31 What is the speed of indexing?
  • 32. Let’s Search Go to http://localhost:8983/solr/ Select techproduct core Select Query tab Enter *:* at the query form. GKH USM 32
  • 33. GKH USM 33 Search results from executing the find of all documents query.
  • 34. View More Search Results In the query form, • Change start to 0 • Change rows to 32 GKH USM 34
  • 35. 14 files were indexed, but the search *:* found 32 documents. GKH USM 35
  • 36. Part III: Demo of Organizational Research Expert Retrieval GKH USM 36
  • 37. Organization Research Experts Retrieval Target Users: Students, Researchers, Collaborators looking for expertise from School of Computer Sciences, Universiti Sains Malaysia. Data Set: Scopus publication data for all academics at the school. Status: Prototype. Purpose: In house solution. Focused search and analytics capabilities. GKH USM 37
  • 38. 1. Design Document/Retrieval Unit GKH USM 38
  • 39. 2. Create Solr Core GKH USM 39 Create a new core to store the collection
  • 40. 3. Perform Indexing Perform indexing at the backend using addDocuments() by Solarium PHP library GKH USM 40 https://github.com/solariumphp
  • 41. 4. Implement Search Front End GKH USM 41
  • 42. 5. Format the Response into Results Page GKH USM 42
  • 43. 6. Demo Visit the prototype at http://ir.cs.usm.my/exsearch4/ Try search “cryptography”. GKH USM 43
  • 44. Thank you GKH USM 44 Visit our school at cs.usm.my The work by IR research at ir.cs.usm.my Drop me an email at khganATusm.my