SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Workshop
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa
Solr
Introduction [Recall]
● Search Platform
● Open-Source
● Search Applications
● Built on top of Lucene
● Why…
○ Enterprise-ready
○ Fast
○ Highly Scalable
● Search + NoSQL
○ Non Relational Data Storage
Features of Apache Solr [Recall]
● Restful APIs
○ No Java programming skills Required
● Full text search
○ tokens, phrases, spell check, wildcard, and auto-complete
● Enterprise ready
● Flexible and Extensible
● NoSQL database
● Admin Interface
● Highly Scalable
● Text-Centric and Sorted by Relevance
How do Search Engines Work?
Installing Solr
● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version
of Slor)
● Extract the Downloaded Compressed File to Your System
● Now in the Terminal Run Command (should change directory of terminal to
Extracted Solr Folder)
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Goto http://localhost:8983/
Techproducts Example
● Starting Solr with Example
○ Unix*: bin/solr -e techproducts
○ Windows: binsolr.cmd -e techproducts
● To verify that Solr is running, you can do this:
○ Unix*: bin/solr status
○ Windows: binsolr.cmd status
● Access Admin Panel
○ http://localhost:8983/solr/
Adding Documents
● Open example/exampledocs/sd500.xml
● Add files to Solr using post.jar
○ cd example/exampledocs
○ java -Dc=techproducts -jar post.jar sd500.xml
● 2 main ways
○ HTTP
○ Native client
<add><doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
...
<field name="inStock">true</field>
</doc></add>
Searching Overview
● Select API Command
○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json
● Need only Name and ID of all elements?
○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso
n&fl=id,name
● Shutdown
○ Unix*: bin/solr stop
○ Windows: binsolr.cmd stop
● Delete Collection
○ Unix*: bin/solr delete -c techproducts
○ Windows: binsolr.cmd delete -c techproducts
Basic Solr Concepts
● Inverted Index
● Index consists of one or more Documents
● Document consists of one or more Fields
● Every field has a Field Type
● Schema
○ Before adding documents to Solr, you need to specify the schema ! (very important)
○ Schema File: schema.xml
● Schema declares
○ what kinds of fields there are
○ which field should be used as the unique/primary key
○ which fields are required
○ how to index and search each field
Basic Solr Concepts [Contd..]
● Field Types
○ float
○ long
○ double
○ date
○ Text
● Define new field types!
<fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldtype>
Basic Solr Concepts [Contd..]
● Defining a Field
○ name: Name of the field
○ type: Field type
○ indexed: Should this field be added to the inverted index?
○ stored: Should the original value of this field be stored?
○ multiValued: Can this field have multiple values
<field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
Example Documents
● Use your own project corpus
● Movie Dataset: URL: https://bit.ly/2JhpEhF
Create a Collection
● Start Solr
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Create Collection
○ Unix*: bin/solr create -c movies
○ Windows: binsolr.cmd create -c movies
● Defining Schema
○ Two Approaches
■ Schemaless with “field guessing” feature (Managed Schema)
■ Use schema.xml with custom schema
Custom Schema
● Rename managed_schema file to schema.xml
● schema.xml
○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/>
○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/>
○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/>
● solrconfig.xml
○ <schemaFactory class="ClassicIndexSchemaFactory"/>
○ ${update.autoCreateFields:false}
Add Documents
Curl "http://localhost:8983/solr/movies/update?commit=true"
--data-binary @example/movies/movies_metadata.csv -H
"Content-type:application/csv"
Basic Queries
Get All Documents:
http://localhost:8983/solr/movies/select?q=*:*&wt=json
Search Documents Containing “Toy Story” in “title” field:
http://localhost:8983/solr/movies/select?q=title:Toy%20Story&
wt=json
Search Documents Containing “Toy Story”:
http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j
son (!)
The Fix… (Copy Field)
● Add a Copy Field
<copyField source="*" dest="_text_"/>
● Is it ok? No!
● Only Few Fields
● Which Fields?
○ Title
○ Tagline
○ Overview
Custom Copy Fields
● Add following to schema.xml
<copyField source="title" dest="_text_"/>
<copyField source="tagline" dest="_text_"/>
<copyField source="overview" dest="_text_"/>
● Note that the destination should be marked multiValued="true"
<field name="_text_" type="text_general" indexed="true"
stored="false" multiValued="true"/>
Analyzers
● Analyzers are specified as a child of the <fieldType>
<fieldType name="nametext" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/>
</fieldType>
● Using simple processing steps
<fieldType name="nametext" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"/>
</analyzer>
</fieldType>
● Create custom Text Field: text_title
● Filters used in Analyzers
○ Tokenize : Tokenizer
<tokenizer class="solr.StandardTokenizerFactory"/>
○ Stopwords : Filter (stopwords.txt)
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
○ LowerCase: Filter
<filter class="solr.LowerCaseFilterFactory"/>
○ Synonyms : Filter (synonyms.txt)
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
Filters
Analysis Phases
● Separate Analyzers for Index and Query
<fieldType name="nametext" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Synonyms (synonyms.txt)
● Add Some Synonyms to synonyms.txt
○ story, story, tale, fiction
○ heat, heat, hot, warm
○ se7en, se7en, seven, 7
● Spell correction with Synonyms
○ stores => stories
Toy Stories Example
Advanced Queries
● Search title:Mask AND tagline:hero
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero
● Search The Mask in title or Mask in title with hero in tagline
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O
R%20title%3A%22The%20Mask%22
● Wildcard matching: Search movies that have a title starting with “The”
○ title: ^the
○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22
● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the
overview field
○ “exorcist spirits"~4
○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
● Range Queries
○ Inclusive Range Query: Square brackets [ & ]
■ budget:[500000 TO *]
○ Exclusive Range Query: Curly brackets { & }
■ budget:{500000 TO *}
● Boosting a Term with ^
○ Want a term to be more relevant?
■ toy^4 story
● For more about Queries:
○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
Advanced Queries
The Schemaless Approach
● Let's do the same in Schemaless Approach
Questions?
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa

Weitere ähnliche Inhalte

Was ist angesagt?

Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
XXE: How to become a Jedi
XXE: How to become a JediXXE: How to become a Jedi
XXE: How to become a JediYaroslav Babin
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This WorksSease
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Mark Dai
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 
Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Umesh Prasad
 
Iocp 기본 구조 이해
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해Nam Hyeonuk
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 

Was ist angesagt? (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
JSON in Solr: from top to bottom
JSON in Solr: from top to bottomJSON in Solr: from top to bottom
JSON in Solr: from top to bottom
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
XXE: How to become a Jedi
XXE: How to become a JediXXE: How to become a Jedi
XXE: How to become a Jedi
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This Works
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr
 
Iocp 기본 구조 이해
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 

Ähnlich wie Solr workshop

Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance OptimizationMindfire Solutions
 

Ähnlich wie Solr workshop (20)

Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Manticore 6.pdf
Manticore 6.pdfManticore 6.pdf
Manticore 6.pdf
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Python for web security - beginner
Python for web security - beginnerPython for web security - beginner
Python for web security - beginner
 
Nzitf Velociraptor Workshop
Nzitf Velociraptor WorkshopNzitf Velociraptor Workshop
Nzitf Velociraptor Workshop
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Solr5
Solr5Solr5
Solr5
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance Optimization
 
Solr02 fields
Solr02 fieldsSolr02 fields
Solr02 fields
 

Mehr von Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisYasas Senarath
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Yasas Senarath
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Yasas Senarath
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion MiningYasas Senarath
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big DataYasas Senarath
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisYasas Senarath
 

Mehr von Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 

Kürzlich hochgeladen

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 

Kürzlich hochgeladen (20)

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 

Solr workshop

  • 1. Workshop Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa Solr
  • 2. Introduction [Recall] ● Search Platform ● Open-Source ● Search Applications ● Built on top of Lucene ● Why… ○ Enterprise-ready ○ Fast ○ Highly Scalable ● Search + NoSQL ○ Non Relational Data Storage
  • 3. Features of Apache Solr [Recall] ● Restful APIs ○ No Java programming skills Required ● Full text search ○ tokens, phrases, spell check, wildcard, and auto-complete ● Enterprise ready ● Flexible and Extensible ● NoSQL database ● Admin Interface ● Highly Scalable ● Text-Centric and Sorted by Relevance
  • 4. How do Search Engines Work?
  • 5. Installing Solr ● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version of Slor) ● Extract the Downloaded Compressed File to Your System ● Now in the Terminal Run Command (should change directory of terminal to Extracted Solr Folder) ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Goto http://localhost:8983/
  • 6. Techproducts Example ● Starting Solr with Example ○ Unix*: bin/solr -e techproducts ○ Windows: binsolr.cmd -e techproducts ● To verify that Solr is running, you can do this: ○ Unix*: bin/solr status ○ Windows: binsolr.cmd status ● Access Admin Panel ○ http://localhost:8983/solr/
  • 7. Adding Documents ● Open example/exampledocs/sd500.xml ● Add files to Solr using post.jar ○ cd example/exampledocs ○ java -Dc=techproducts -jar post.jar sd500.xml ● 2 main ways ○ HTTP ○ Native client <add><doc> <field name="id">9885A004</field> <field name="name">Canon PowerShot SD500</field> <field name="manu">Canon Inc.</field> ... <field name="inStock">true</field> </doc></add>
  • 8. Searching Overview ● Select API Command ○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json ● Need only Name and ID of all elements? ○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso n&fl=id,name ● Shutdown ○ Unix*: bin/solr stop ○ Windows: binsolr.cmd stop ● Delete Collection ○ Unix*: bin/solr delete -c techproducts ○ Windows: binsolr.cmd delete -c techproducts
  • 9. Basic Solr Concepts ● Inverted Index ● Index consists of one or more Documents ● Document consists of one or more Fields ● Every field has a Field Type ● Schema ○ Before adding documents to Solr, you need to specify the schema ! (very important) ○ Schema File: schema.xml ● Schema declares ○ what kinds of fields there are ○ which field should be used as the unique/primary key ○ which fields are required ○ how to index and search each field
  • 10. Basic Solr Concepts [Contd..] ● Field Types ○ float ○ long ○ double ○ date ○ Text ● Define new field types! <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldtype>
  • 11. Basic Solr Concepts [Contd..] ● Defining a Field ○ name: Name of the field ○ type: Field type ○ indexed: Should this field be added to the inverted index? ○ stored: Should the original value of this field be stored? ○ multiValued: Can this field have multiple values <field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
  • 12. Example Documents ● Use your own project corpus ● Movie Dataset: URL: https://bit.ly/2JhpEhF
  • 13. Create a Collection ● Start Solr ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Create Collection ○ Unix*: bin/solr create -c movies ○ Windows: binsolr.cmd create -c movies ● Defining Schema ○ Two Approaches ■ Schemaless with “field guessing” feature (Managed Schema) ■ Use schema.xml with custom schema
  • 14. Custom Schema ● Rename managed_schema file to schema.xml ● schema.xml ○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/> ○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/> ○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/> ● solrconfig.xml ○ <schemaFactory class="ClassicIndexSchemaFactory"/> ○ ${update.autoCreateFields:false}
  • 15. Add Documents Curl "http://localhost:8983/solr/movies/update?commit=true" --data-binary @example/movies/movies_metadata.csv -H "Content-type:application/csv"
  • 16. Basic Queries Get All Documents: http://localhost:8983/solr/movies/select?q=*:*&wt=json Search Documents Containing “Toy Story” in “title” field: http://localhost:8983/solr/movies/select?q=title:Toy%20Story& wt=json Search Documents Containing “Toy Story”: http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j son (!)
  • 17. The Fix… (Copy Field) ● Add a Copy Field <copyField source="*" dest="_text_"/> ● Is it ok? No! ● Only Few Fields ● Which Fields? ○ Title ○ Tagline ○ Overview
  • 18. Custom Copy Fields ● Add following to schema.xml <copyField source="title" dest="_text_"/> <copyField source="tagline" dest="_text_"/> <copyField source="overview" dest="_text_"/> ● Note that the destination should be marked multiValued="true" <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
  • 19. Analyzers ● Analyzers are specified as a child of the <fieldType> <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/> </fieldType> ● Using simple processing steps <fieldType name="nametext" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> </fieldType>
  • 20. ● Create custom Text Field: text_title ● Filters used in Analyzers ○ Tokenize : Tokenizer <tokenizer class="solr.StandardTokenizerFactory"/> ○ Stopwords : Filter (stopwords.txt) <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> ○ LowerCase: Filter <filter class="solr.LowerCaseFilterFactory"/> ○ Synonyms : Filter (synonyms.txt) <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Filters
  • 21. Analysis Phases ● Separate Analyzers for Index and Query <fieldType name="nametext" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  • 22. Synonyms (synonyms.txt) ● Add Some Synonyms to synonyms.txt ○ story, story, tale, fiction ○ heat, heat, hot, warm ○ se7en, se7en, seven, 7 ● Spell correction with Synonyms ○ stores => stories
  • 24. Advanced Queries ● Search title:Mask AND tagline:hero ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero ● Search The Mask in title or Mask in title with hero in tagline ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O R%20title%3A%22The%20Mask%22 ● Wildcard matching: Search movies that have a title starting with “The” ○ title: ^the ○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22 ● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the overview field ○ “exorcist spirits"~4 ○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
  • 25. ● Range Queries ○ Inclusive Range Query: Square brackets [ & ] ■ budget:[500000 TO *] ○ Exclusive Range Query: Curly brackets { & } ■ budget:{500000 TO *} ● Boosting a Term with ^ ○ Want a term to be more relevant? ■ toy^4 story ● For more about Queries: ○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html Advanced Queries
  • 26. The Schemaless Approach ● Let's do the same in Schemaless Approach
  • 27. Questions? Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa