SlideShare ist ein Scribd-Unternehmen logo
Using SOLR as Open-Source Search
Platform for Organizational Research
Experts Retrieval
Dr. Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
GKH USM 1
7th Annual Open Source Summit 2020
Day 5 - 15th December
Talk Outline
Text Search with SOLR
What is SOLR
What is Text Search
Text Search in an Organization
Information Retrieval Concept and SOLR in Use
Demo of Organizational Research Expert Retrieval
GKH USM 2
Part I: Text Search with SOLR
GKH USM 3
What is SOLR?
GKH USM 4
SOLR
Solr is the
enterprise search platform built on
Apache Lucene™.
GKH USM 5
popular
blazing-fast
open source
Solr Resources
Official Website: https://lucene.apache.org/solr/
Documentation: https://lucene.apache.org/solr/resources.html
GKH USM 6
What is Text Search?
Are you thinking about Google?
GKH USM 7
Let’s start with something we are familiar
with …
Users want to type in a few simple keywords and get back great results.
Involved matching query terms to documents
GKH USM 8
Ranked Results
Return “ranked” documents for a query.
A search engine returns documents sorted in descending order by a
score that indicates the strength of the match of the document to the
query.
Ranking by relevancy is important.
Side Question: How to determine the relevancy?
GKH USM 9
What Else
Search engine also
Returns Images, Videos, Social Feeds, Products etc.
Provides search suggestions, auto complete etc.
Gives answer, facts, files etc.
GKH USM 10
Text Search Solution in Your Organization
How do you perform the search within your organization?
documents finding tool in your operating system.
querying of data stored in your database using Sql.
Will you implement a (similar to) Google Search Engine for
your organization?
GKH USM 11
Questions for Audience
1. List one TEXT or DOCUMENT
LOOKUP/SEARCH/NAVIGATION PROBLEM that you have
encountered in your organization.
2. Is there any existing search tool or feature implemented
within your organization that you can adapt to address the
problem?
GKH USM 12
Enterprise Search
Managing search solutions within an organization or for the benefit of an
organization …
GKH USM 13
IR and Search Engines
Relevance
-Effective ranking
Evaluation
-Testing and
measuring
Information needs
-User interaction
Performance
-Efficient search and indexing
Incorporating new data
-Coverage and freshness
Scalability
-Growing with data and users
Adaptability
-Tuning for applications
Specific problems
-e.g. Spam
Information
Retrieval
Search
Engines
14
Enterprise
Search
Engines
GKH USM
Enterprise Search Features
1. Not only Texts.
Unifying structured and unstructured data.
2. Not only Search.
Search + Analytics.
3. Not for Everyone.
Not addressing a general problem, but specific to
application/domain/business needs.
GKH USM 15
Part II: IR Concept & SOLR in Use
GKH USM 16
High Level Information Retrieval Concept
GKH USM 17
Documents
Document
Representation
Information Needs
Query
Retrieved
Documents
Indexing Formulation
Retrieval Function
Relevance Feedback
Diagram
of
the
main
components
of
Solr
4
GKH USM 18
Image Source: Solr in
Action, Graiger & Potter
Documents
Indexing
Document
Representation
Query
Retrieval Function
Retrieved
Documents
Glimpse of Solr In Use
GKH USM 19
SOLR Downloads Site
GKH USM 20
Unzipping SOLR into Directory
For Windows users, we highly recommend that you extract Solr to a directory that
doesn’t have spaces in the name; that is, avoid extracting Solr into directories like
C:Documents and Settings or C:Program Files. For example, use path like
c:solr-8.2.0 instead.
For Linux users, choose a location like /opt/solr/.
GKH USM 21
View SOLR in Your Directory
Example directory listing of the solr-8.2.0 installation after extracting the
downloaded archive on your computer. We’ll refer to the top-level
directory as $SOLR_INSTALL/ throughout the rest of the slides.
GKH USM 22
Start Solr
To start Solr, you need to run solr script located at the bin folder.
For example, if your placed solr at c:solr-8.2.0
Open a command line, and enter the following:
$ cd #this gets you to the base directory
$ cd $SOLR_INSTALL #this gets you to your solr folder
$ cd bin #this gets you to your bin folder
$ bin/solr start
Note: cd – change directory
GKH USM 23
Start Solr
GKH USM 24
Admin Console - http://localhost:8983/solr/
GKH USM 25
Create Your First Core
Your running server is empty.
Create your first core, called “techproduct”.
Go to command prompt,
$ bin/solr create –c techproduct
GKH USM 26
Meet Your First Core
GKH USM 27
View Properties of The Core
* You can also Add/Rename Core at the admin page.
GKH USM 28
Add Some Example Documents
When you first start Solr, there are no documents in the index. It’s an empty server
waiting to be filled with data to search.
Let’s add some documents from exampleexampledocs directory.
What are the example file types in your
$ SOLR_INSTALLexampleexampledocs ??
GKH USM 29
Use Post Tool to Add Documents
For Unix user, you can call Post Tool from bin
$ bin/post -c techproduct example/exampledocs/*.xml
For Window user, navigate to the exampleexampledocs folder
$ cd
$ cd $SOLR_INSTALLexampleexampledocs
$ java -jar –Dc=techproduct post.jar *.xml
specify core
the files to be added. In this case, we
are adding all files with .xml type
GKH USM 30
Status of Added Files
GKH USM 31
What is the speed of indexing?
Let’s Search
Go to http://localhost:8983/solr/
Select techproduct core
Select Query tab
Enter *:* at the query form.
GKH USM 32
GKH USM 33
Search results from
executing
the find of all
documents query.
View More Search Results
In the query form,
• Change start to 0
• Change rows to 32
GKH USM 34
14 files were indexed, but the search *:*
found 32 documents.
GKH USM 35
Part III: Demo of Organizational
Research Expert Retrieval
GKH USM 36
Organization Research Experts Retrieval
Target Users: Students, Researchers, Collaborators looking for expertise
from School of Computer Sciences, Universiti Sains Malaysia.
Data Set: Scopus publication data for all academics at the school.
Status: Prototype.
Purpose: In house solution.
Focused search and analytics capabilities.
GKH USM 37
1. Design Document/Retrieval Unit
GKH USM 38
2. Create Solr Core
GKH USM 39
Create a new core to
store the collection
3. Perform Indexing
Perform indexing at the
backend
using addDocuments()
by Solarium PHP library
GKH USM 40
https://github.com/solariumphp
4. Implement Search Front End
GKH USM 41
5. Format the Response into Results Page
GKH USM 42
6. Demo
Visit the prototype at
http://ir.cs.usm.my/exsearch4/
Try search “cryptography”.
GKH USM 43
Thank you
GKH USM 44
Visit our school at cs.usm.my
The work by IR research at ir.cs.usm.my
Drop me an email at khganATusm.my

Weitere ähnliche Inhalte

Ähnlich wie OSS 2020 Using SOLR as Open-Source Search Platform.pdf

Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_engg
Annie Thomas
 
Whats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-campWhats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-camp
timohund
 
ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!
René Winkelmeyer
 
Knolidge
KnolidgeKnolidge
Knolidge
deepakpatil84
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Content search api in sitecore 8.1
Content search api in sitecore 8.1Content search api in sitecore 8.1
Content search api in sitecore 8.1
Anindita Bhattacharya
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
Syahmi Afiq Nizam
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Alkacon Software GmbH & Co. KG
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Have
knolidge
 
Wordpress as a framework
Wordpress as a frameworkWordpress as a framework
Wordpress as a framework
Aggelos Synadakis
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
Splunk
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
Lucidworks
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development
Robin O'Brien
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
inovex GmbH
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Basic Android
Basic AndroidBasic Android

Ähnlich wie OSS 2020 Using SOLR as Open-Source Search Platform.pdf (20)

Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_engg
 
Whats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-campWhats new-in-solr-8-typo3-camp
Whats new-in-solr-8-typo3-camp
 
ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!
 
Knolidge
KnolidgeKnolidge
Knolidge
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Content search api in sitecore 8.1
Content search api in sitecore 8.1Content search api in sitecore 8.1
Content search api in sitecore 8.1
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Have
 
Wordpress as a framework
Wordpress as a frameworkWordpress as a framework
Wordpress as a framework
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Basic Android
Basic AndroidBasic Android
Basic Android
 

Mehr von Gan Keng Hoon

A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels
Gan Keng Hoon
 
Keywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using RKeywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using R
Gan Keng Hoon
 
Procrastination and Phd.pdf
Procrastination and Phd.pdfProcrastination and Phd.pdf
Procrastination and Phd.pdf
Gan Keng Hoon
 
Guest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdfGuest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdf
Gan Keng Hoon
 
Knowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdfKnowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdf
Gan Keng Hoon
 
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Gan Keng Hoon
 
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Gan Keng Hoon
 
Text and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business IntelligenceText and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business Intelligence
Gan Keng Hoon
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...
Gan Keng Hoon
 
Semantics in Retrieval
Semantics in Retrieval Semantics in Retrieval
Semantics in Retrieval
Gan Keng Hoon
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
Gan Keng Hoon
 
Faceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise BibliographiesFaceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise Bibliographies
Gan Keng Hoon
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
Gan Keng Hoon
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
Gan Keng Hoon
 
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and PublishingA Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
Gan Keng Hoon
 
Wi 2015 demo_preview
Wi 2015 demo_previewWi 2015 demo_preview
Wi 2015 demo_preview
Gan Keng Hoon
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
Gan Keng Hoon
 

Mehr von Gan Keng Hoon (17)

A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels A View of Text Analytics from Word, Sentence and Document Levels
A View of Text Analytics from Word, Sentence and Document Levels
 
Keywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using RKeywords Discovery with Simple Text Mining using R
Keywords Discovery with Simple Text Mining using R
 
Procrastination and Phd.pdf
Procrastination and Phd.pdfProcrastination and Phd.pdf
Procrastination and Phd.pdf
 
Guest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdfGuest Lecture for Principles of Data Analytics.pdf
Guest Lecture for Principles of Data Analytics.pdf
 
Knowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdfKnowledge Representation Reasoning and Acquisition.pdf
Knowledge Representation Reasoning and Acquisition.pdf
 
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...
 
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
Interfacing Chatbot with Data Retrieval and Analytics Queries for Decision Ma...
 
Text and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business IntelligenceText and Sentiment Analytics for Business Intelligence
Text and Sentiment Analytics for Business Intelligence
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...
 
Semantics in Retrieval
Semantics in Retrieval Semantics in Retrieval
Semantics in Retrieval
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
Faceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise BibliographiesFaceted Search for Finding Expertise Bibliographies
Faceted Search for Finding Expertise Bibliographies
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
 
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and PublishingA Brief Introduction to Knowledge Acquisition, Representation and Publishing
A Brief Introduction to Knowledge Acquisition, Representation and Publishing
 
Wi 2015 demo_preview
Wi 2015 demo_previewWi 2015 demo_preview
Wi 2015 demo_preview
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 

Kürzlich hochgeladen

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Kürzlich hochgeladen (20)

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

OSS 2020 Using SOLR as Open-Source Search Platform.pdf

  • 1. Using SOLR as Open-Source Search Platform for Organizational Research Experts Retrieval Dr. Gan Keng Hoon School of Computer Sciences Universiti Sains Malaysia GKH USM 1 7th Annual Open Source Summit 2020 Day 5 - 15th December
  • 2. Talk Outline Text Search with SOLR What is SOLR What is Text Search Text Search in an Organization Information Retrieval Concept and SOLR in Use Demo of Organizational Research Expert Retrieval GKH USM 2
  • 3. Part I: Text Search with SOLR GKH USM 3
  • 5. SOLR Solr is the enterprise search platform built on Apache Lucene™. GKH USM 5 popular blazing-fast open source
  • 6. Solr Resources Official Website: https://lucene.apache.org/solr/ Documentation: https://lucene.apache.org/solr/resources.html GKH USM 6
  • 7. What is Text Search? Are you thinking about Google? GKH USM 7
  • 8. Let’s start with something we are familiar with … Users want to type in a few simple keywords and get back great results. Involved matching query terms to documents GKH USM 8
  • 9. Ranked Results Return “ranked” documents for a query. A search engine returns documents sorted in descending order by a score that indicates the strength of the match of the document to the query. Ranking by relevancy is important. Side Question: How to determine the relevancy? GKH USM 9
  • 10. What Else Search engine also Returns Images, Videos, Social Feeds, Products etc. Provides search suggestions, auto complete etc. Gives answer, facts, files etc. GKH USM 10
  • 11. Text Search Solution in Your Organization How do you perform the search within your organization? documents finding tool in your operating system. querying of data stored in your database using Sql. Will you implement a (similar to) Google Search Engine for your organization? GKH USM 11
  • 12. Questions for Audience 1. List one TEXT or DOCUMENT LOOKUP/SEARCH/NAVIGATION PROBLEM that you have encountered in your organization. 2. Is there any existing search tool or feature implemented within your organization that you can adapt to address the problem? GKH USM 12
  • 13. Enterprise Search Managing search solutions within an organization or for the benefit of an organization … GKH USM 13
  • 14. IR and Search Engines Relevance -Effective ranking Evaluation -Testing and measuring Information needs -User interaction Performance -Efficient search and indexing Incorporating new data -Coverage and freshness Scalability -Growing with data and users Adaptability -Tuning for applications Specific problems -e.g. Spam Information Retrieval Search Engines 14 Enterprise Search Engines GKH USM
  • 15. Enterprise Search Features 1. Not only Texts. Unifying structured and unstructured data. 2. Not only Search. Search + Analytics. 3. Not for Everyone. Not addressing a general problem, but specific to application/domain/business needs. GKH USM 15
  • 16. Part II: IR Concept & SOLR in Use GKH USM 16
  • 17. High Level Information Retrieval Concept GKH USM 17 Documents Document Representation Information Needs Query Retrieved Documents Indexing Formulation Retrieval Function Relevance Feedback
  • 18. Diagram of the main components of Solr 4 GKH USM 18 Image Source: Solr in Action, Graiger & Potter Documents Indexing Document Representation Query Retrieval Function Retrieved Documents
  • 19. Glimpse of Solr In Use GKH USM 19
  • 21. Unzipping SOLR into Directory For Windows users, we highly recommend that you extract Solr to a directory that doesn’t have spaces in the name; that is, avoid extracting Solr into directories like C:Documents and Settings or C:Program Files. For example, use path like c:solr-8.2.0 instead. For Linux users, choose a location like /opt/solr/. GKH USM 21
  • 22. View SOLR in Your Directory Example directory listing of the solr-8.2.0 installation after extracting the downloaded archive on your computer. We’ll refer to the top-level directory as $SOLR_INSTALL/ throughout the rest of the slides. GKH USM 22
  • 23. Start Solr To start Solr, you need to run solr script located at the bin folder. For example, if your placed solr at c:solr-8.2.0 Open a command line, and enter the following: $ cd #this gets you to the base directory $ cd $SOLR_INSTALL #this gets you to your solr folder $ cd bin #this gets you to your bin folder $ bin/solr start Note: cd – change directory GKH USM 23
  • 25. Admin Console - http://localhost:8983/solr/ GKH USM 25
  • 26. Create Your First Core Your running server is empty. Create your first core, called “techproduct”. Go to command prompt, $ bin/solr create –c techproduct GKH USM 26
  • 27. Meet Your First Core GKH USM 27
  • 28. View Properties of The Core * You can also Add/Rename Core at the admin page. GKH USM 28
  • 29. Add Some Example Documents When you first start Solr, there are no documents in the index. It’s an empty server waiting to be filled with data to search. Let’s add some documents from exampleexampledocs directory. What are the example file types in your $ SOLR_INSTALLexampleexampledocs ?? GKH USM 29
  • 30. Use Post Tool to Add Documents For Unix user, you can call Post Tool from bin $ bin/post -c techproduct example/exampledocs/*.xml For Window user, navigate to the exampleexampledocs folder $ cd $ cd $SOLR_INSTALLexampleexampledocs $ java -jar –Dc=techproduct post.jar *.xml specify core the files to be added. In this case, we are adding all files with .xml type GKH USM 30
  • 31. Status of Added Files GKH USM 31 What is the speed of indexing?
  • 32. Let’s Search Go to http://localhost:8983/solr/ Select techproduct core Select Query tab Enter *:* at the query form. GKH USM 32
  • 33. GKH USM 33 Search results from executing the find of all documents query.
  • 34. View More Search Results In the query form, • Change start to 0 • Change rows to 32 GKH USM 34
  • 35. 14 files were indexed, but the search *:* found 32 documents. GKH USM 35
  • 36. Part III: Demo of Organizational Research Expert Retrieval GKH USM 36
  • 37. Organization Research Experts Retrieval Target Users: Students, Researchers, Collaborators looking for expertise from School of Computer Sciences, Universiti Sains Malaysia. Data Set: Scopus publication data for all academics at the school. Status: Prototype. Purpose: In house solution. Focused search and analytics capabilities. GKH USM 37
  • 38. 1. Design Document/Retrieval Unit GKH USM 38
  • 39. 2. Create Solr Core GKH USM 39 Create a new core to store the collection
  • 40. 3. Perform Indexing Perform indexing at the backend using addDocuments() by Solarium PHP library GKH USM 40 https://github.com/solariumphp
  • 41. 4. Implement Search Front End GKH USM 41
  • 42. 5. Format the Response into Results Page GKH USM 42
  • 43. 6. Demo Visit the prototype at http://ir.cs.usm.my/exsearch4/ Try search “cryptography”. GKH USM 43
  • 44. Thank you GKH USM 44 Visit our school at cs.usm.my The work by IR research at ir.cs.usm.my Drop me an email at khganATusm.my