SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Introduction to Apache Lucene

Sumit Luthra
Agenda
What is Apache Lucene ?
Focus of Apache Lucene
Lucene Architecture
Core Indexing Classes
Core Searching Classes
Demo
Questions & Answers
What is Apache Lucene?
Apache Lucene is a high-performance, full- featured text search
engine library written entirely in Java.”
Also known as Information Retrieval Library.
Lucene is specifically an API, not an application.
Open Source
Focus
Indexing Documents
Searching Documents

Note :
You can use Lucene to provide consistent full-text indexing across
both database objects and documents in various formats (Microsoft
Office documents, PDF, HTML, text, emails and so on).
Lucene Architecture
Index
document

Users

Analyze
document

Search UI

Build document

Index

Build
query

Render
results

Acquire content
Raw
Content

Run query
Indexing Documents
IndexWriter writer = new IndexWriter(directory, analyzer, true);
Document doc = new Document();
doc.add(new Field(“content", “Hello World”,
Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field(“name", “filename.txt",
Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field(“path", “http://myfile/",
Field.Store.YES, Field.Index.TOKENIZED));
// [...]
writer.addDocument(doc);
writer.close();
Core indexing classes
IndexWriter
Directory
Analyzer
Document
Field
IndexWriter construction
// Deprecated
IndexWriter(Directory d, Analyzer a, // default analyzer
IndexWriter.MaxFieldLength mfl);

// Preferred
IndexWriter(Directory d,
IndexWriterConfig c);
Directory
FSDirectory
RAMDirectory
DbDirectory
FileSwitchDirectory
JEDirectory
Analyzers
Tokenizes the input text
Common Analyzers
–

WhitespaceAnalyzer
Splits tokens on whitespace

–

SimpleAnalyzer
Splits tokens on non-letters, and then lowercases

–

StopAnalyzer
Same as SimpleAnalyzer, but also removes stop words

–

StandardAnalyzer
Most sophisticated analyzer that knows about certain token types,
lowercases, removes stop words, ...
Analysis examples
•

“The quick brown fox jumped over the lazy dog”

•

WhitespaceAnalyzer
–

•

SimpleAnalyzer
–

•

[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]

StopAnalyzer
–

•

[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]

[quick] [brown] [fox] [jumped] [over] [lazy] [dog]

StandardAnalyzer
–

[quick] [brown] [fox] [jumped] [over] [lazy] [dog]
More analysis examples
•

“XY&Z Corporation – xyz@example.com”

•

WhitespaceAnalyzer
–

•

SimpleAnalyzer
–

•

[xy] [z] [corporation] [xyz] [example] [com]

StopAnalyzer
–

•

[XY&Z] [Corporation] [-] [xyz@example.com]

[xy] [z] [corporation] [xyz] [example] [com]

StandardAnalyzer
–

[xy&z] [corporation] [xyz@example.com]
Document & Fields
A Document is the atomic unit of indexing and
searching, It contains Fields
Fields have a name and a value
–

You have to translate raw content into Fields

–

Examples: Title, author, date, abstract, body, URL, keywords, ...

–

Different documents can have different fields
Field options
Field.Store
–

NO : Don’t store the field value in the index

–

YES : Store the field value in the index

Field.Index
–

ANALYZED : Tokenize with an Analyzer

–

NOT_ANALYZED : Do not tokenize

–

NO : Do not index this field
Searching an Index
IndexSearcher searcher = new IndexSearcher(directory);
QueryParser parser = new QueryParser(Version, field_name
,analyzer);
Query query = parser.parse(WORD_SEARCHED);
TopDocs hits = searcher.search(query, noOfHits);
ScoreDoc[] document = hits.scoreDocs;
Document doc = searcher.doc(0); // look at first match
System.out.println(“name=" + doc.get(“name"));
searcher.close();
Core searching classes
IndexSearcher
Query
QueryParser
TopDocs
ScoreDoc
IndexSearcher
Constructor:
–

IndexSearcher(Directory d);
•

–

// Deprecated

IndexSearcher(IndexReader r);
•

Construct an IndexReader with static method
IndexReader.open(dir)
Query
•

TermQuery
–

Constructed from a Term

•

TermRangeQuery

•

NumericRangeQuery

•

PrefixQuery

•

BooleanQuery

•

PhraseQuery

•

WildcardQuery

•

FuzzyQuery

•

MatchAllDocsQuery
QueryParser
•

Constructor
–

•

QueryParser(Version matchVersion,
String defaultField,
Analyzer analyzer);

Parsing methods
–

Query parse(String query) throws
ParseException;

–

... and many more
QueryParser syntax examples
Query expression

Document matches if…

java

Contains the term java in the default field

java junit
java OR junit

Contains the term java or junit or both in the default field
(the default operator can be changed to AND)

+java +junit

Contains both java and junit in the default field

java AND junit
title:ant

Contains the term ant in the title field

title:extreme –subject:sports

Contains extreme in the title and not sports in subject

(agile OR extreme) AND java

Boolean expression matches

title:”junit in action”

Phrase matches in title

title:”junit action”~5

Proximity matches (within 5) in title

java*

Wildcard matches

java~

Fuzzy matches

lastmodified:[1/1/09 TO
12/31/09]

Range matches
TopDocs
Class containing top N ranked searched documents/results
that match a given query.

ScoreDoc
Array of ScoreDoc containing documents/results
that match a given query.
Demo of simple indexing and searching
using Apache Lucene

You will require lucene-core-x.y.jar for this demo.
Any Questions ?
Thank You.

Weitere ähnliche Inhalte

Was ist angesagt?

What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Next-level integration with Spring Data Elasticsearch
Next-level integration with Spring Data ElasticsearchNext-level integration with Spring Data Elasticsearch
Next-level integration with Spring Data ElasticsearchElasticsearch
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementMohamed hedi Abidi
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseRicha Budhraja
 

Was ist angesagt? (20)

What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Next-level integration with Spring Data Elasticsearch
Next-level integration with Spring Data ElasticsearchNext-level integration with Spring Data Elasticsearch
Next-level integration with Spring Data Elasticsearch
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Spark
SparkSpark
Spark
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et Développement
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 

Andere mochten auch

Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneJosiane Gamgo
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scriptingTony Fabeen
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted indexweedge
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Adrien Grand
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisJosiane Gamgo
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 

Andere mochten auch (20)

Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache Lucene
 
Solr
SolrSolr
Solr
 
Search Lucene
Search LuceneSearch Lucene
Search Lucene
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Index types
Index typesIndex types
Index types
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Lucene
LuceneLucene
Lucene
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Lucandra
LucandraLucandra
Lucandra
 
Inverted index
Inverted indexInverted index
Inverted index
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted index
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's Thesis
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
The search engine index
The search engine indexThe search engine index
The search engine index
 

Ähnlich wie Introduction To Apache Lucene

Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索longkeyy
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 

Ähnlich wie Introduction To Apache Lucene (20)

Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Apache Lucene Basics
Apache Lucene BasicsApache Lucene Basics
Apache Lucene Basics
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
 
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 

Mehr von Mindfire Solutions (20)

Physician Search and Review
Physician Search and ReviewPhysician Search and Review
Physician Search and Review
 
diet management app
diet management appdiet management app
diet management app
 
Business Technology Solution
Business Technology SolutionBusiness Technology Solution
Business Technology Solution
 
Remote Health Monitoring
Remote Health MonitoringRemote Health Monitoring
Remote Health Monitoring
 
Influencer Marketing Solution
Influencer Marketing SolutionInfluencer Marketing Solution
Influencer Marketing Solution
 
ELMAH
ELMAHELMAH
ELMAH
 
High Availability of Azure Applications
High Availability of Azure ApplicationsHigh Availability of Azure Applications
High Availability of Azure Applications
 
IOT Hands On
IOT Hands OnIOT Hands On
IOT Hands On
 
Glimpse of Loops Vs Set
Glimpse of Loops Vs SetGlimpse of Loops Vs Set
Glimpse of Loops Vs Set
 
Oracle Sql Developer-Getting Started
Oracle Sql Developer-Getting StartedOracle Sql Developer-Getting Started
Oracle Sql Developer-Getting Started
 
Adaptive Layout In iOS 8
Adaptive Layout In iOS 8Adaptive Layout In iOS 8
Adaptive Layout In iOS 8
 
Introduction to Auto-layout : iOS/Mac
Introduction to Auto-layout : iOS/MacIntroduction to Auto-layout : iOS/Mac
Introduction to Auto-layout : iOS/Mac
 
LINQPad - utility Tool
LINQPad - utility ToolLINQPad - utility Tool
LINQPad - utility Tool
 
Get started with watch kit development
Get started with watch kit developmentGet started with watch kit development
Get started with watch kit development
 
Swift vs Objective-C
Swift vs Objective-CSwift vs Objective-C
Swift vs Objective-C
 
Material Design in Android
Material Design in AndroidMaterial Design in Android
Material Design in Android
 
Introduction to OData
Introduction to ODataIntroduction to OData
Introduction to OData
 
Ext js Part 2- MVC
Ext js Part 2- MVCExt js Part 2- MVC
Ext js Part 2- MVC
 
ExtJs Basic Part-1
ExtJs Basic Part-1ExtJs Basic Part-1
ExtJs Basic Part-1
 
Spring Security Introduction
Spring Security IntroductionSpring Security Introduction
Spring Security Introduction
 

Kürzlich hochgeladen

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Kürzlich hochgeladen (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Introduction To Apache Lucene

  • 1. Introduction to Apache Lucene Sumit Luthra
  • 2. Agenda What is Apache Lucene ? Focus of Apache Lucene Lucene Architecture Core Indexing Classes Core Searching Classes Demo Questions & Answers
  • 3. What is Apache Lucene? Apache Lucene is a high-performance, full- featured text search engine library written entirely in Java.” Also known as Information Retrieval Library. Lucene is specifically an API, not an application. Open Source
  • 4. Focus Indexing Documents Searching Documents Note : You can use Lucene to provide consistent full-text indexing across both database objects and documents in various formats (Microsoft Office documents, PDF, HTML, text, emails and so on).
  • 5. Lucene Architecture Index document Users Analyze document Search UI Build document Index Build query Render results Acquire content Raw Content Run query
  • 6. Indexing Documents IndexWriter writer = new IndexWriter(directory, analyzer, true); Document doc = new Document(); doc.add(new Field(“content", “Hello World”, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field(“name", “filename.txt", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field(“path", “http://myfile/", Field.Store.YES, Field.Index.TOKENIZED)); // [...] writer.addDocument(doc); writer.close();
  • 8. IndexWriter construction // Deprecated IndexWriter(Directory d, Analyzer a, // default analyzer IndexWriter.MaxFieldLength mfl); // Preferred IndexWriter(Directory d, IndexWriterConfig c);
  • 10. Analyzers Tokenizes the input text Common Analyzers – WhitespaceAnalyzer Splits tokens on whitespace – SimpleAnalyzer Splits tokens on non-letters, and then lowercases – StopAnalyzer Same as SimpleAnalyzer, but also removes stop words – StandardAnalyzer Most sophisticated analyzer that knows about certain token types, lowercases, removes stop words, ...
  • 11. Analysis examples • “The quick brown fox jumped over the lazy dog” • WhitespaceAnalyzer – • SimpleAnalyzer – • [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] StopAnalyzer – • [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] [quick] [brown] [fox] [jumped] [over] [lazy] [dog] StandardAnalyzer – [quick] [brown] [fox] [jumped] [over] [lazy] [dog]
  • 12. More analysis examples • “XY&Z Corporation – xyz@example.com” • WhitespaceAnalyzer – • SimpleAnalyzer – • [xy] [z] [corporation] [xyz] [example] [com] StopAnalyzer – • [XY&Z] [Corporation] [-] [xyz@example.com] [xy] [z] [corporation] [xyz] [example] [com] StandardAnalyzer – [xy&z] [corporation] [xyz@example.com]
  • 13. Document & Fields A Document is the atomic unit of indexing and searching, It contains Fields Fields have a name and a value – You have to translate raw content into Fields – Examples: Title, author, date, abstract, body, URL, keywords, ... – Different documents can have different fields
  • 14. Field options Field.Store – NO : Don’t store the field value in the index – YES : Store the field value in the index Field.Index – ANALYZED : Tokenize with an Analyzer – NOT_ANALYZED : Do not tokenize – NO : Do not index this field
  • 15. Searching an Index IndexSearcher searcher = new IndexSearcher(directory); QueryParser parser = new QueryParser(Version, field_name ,analyzer); Query query = parser.parse(WORD_SEARCHED); TopDocs hits = searcher.search(query, noOfHits); ScoreDoc[] document = hits.scoreDocs; Document doc = searcher.doc(0); // look at first match System.out.println(“name=" + doc.get(“name")); searcher.close();
  • 17. IndexSearcher Constructor: – IndexSearcher(Directory d); • – // Deprecated IndexSearcher(IndexReader r); • Construct an IndexReader with static method IndexReader.open(dir)
  • 18. Query • TermQuery – Constructed from a Term • TermRangeQuery • NumericRangeQuery • PrefixQuery • BooleanQuery • PhraseQuery • WildcardQuery • FuzzyQuery • MatchAllDocsQuery
  • 19. QueryParser • Constructor – • QueryParser(Version matchVersion, String defaultField, Analyzer analyzer); Parsing methods – Query parse(String query) throws ParseException; – ... and many more
  • 20. QueryParser syntax examples Query expression Document matches if… java Contains the term java in the default field java junit java OR junit Contains the term java or junit or both in the default field (the default operator can be changed to AND) +java +junit Contains both java and junit in the default field java AND junit title:ant Contains the term ant in the title field title:extreme –subject:sports Contains extreme in the title and not sports in subject (agile OR extreme) AND java Boolean expression matches title:”junit in action” Phrase matches in title title:”junit action”~5 Proximity matches (within 5) in title java* Wildcard matches java~ Fuzzy matches lastmodified:[1/1/09 TO 12/31/09] Range matches
  • 21. TopDocs Class containing top N ranked searched documents/results that match a given query. ScoreDoc Array of ScoreDoc containing documents/results that match a given query.
  • 22. Demo of simple indexing and searching using Apache Lucene You will require lucene-core-x.y.jar for this demo.