SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Faceted Search with Lucene
Shai Erera
Researcher, IBM
Who Am I
•
•
•
•

Working at IBM – Information Retrieval Research
Lucene/Solr committer and PMC member
http://shaierera.blogspot.com
shaie@apache.org
Lucene Facets 101
Faceted Search
•

Technique for accessing documents that were classified into a taxonomy of categories
–

•

Flat: Author/John Doe, Tags/Lucene, Popularity/High

–

Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP)

Quick overview of the break down of the search results
–

•

How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet

Simplifies interaction with the search application
–

Drilldown to issues that were updated in Past 2 days by clicking a link

–

No knowledge required about search syntax and index schema

http://jirasearch.mikemccandless.com
Lucene Facets
•
•

Contributed by IBM in 2011, released in 3.4.0
Major changes since 4.1.0+
–
–
–
–

•

Two main indexing-time modes
–
–

•

Taxonomy-based: hierarchical facets, managed by a
sidecar index, low NRT reopen cost
SortedSetDocValues: flat facets only, no sidecar index,
higher NRT reopen cost

Runtime modes
–

•

NRT support
Nearly 400% search speedups
Complete API revamp
New features (SortedSet, range faceting, drill-sideways)

Range facets (on NumericDocValues fields)

Other implementations: Solr, ElasticSearch, Bobo
Browse
Lucene Facet Components
•

TaxonomyWriter/Reader
–

•

FacetFields
–

•

Defines which facets to aggregate and the FacetsAggregator (aggregation function)

FacetsCollector
–

•

Add facets information to documents (DocValues fields, drilldown terms)

FacetRequest
–

•

Manage the taxonomy information

Collects matching documents and computes the top-K categories for each facet request
(invokes FacetsAccumulator)

DrillDownQuery / DrillSideways
–

Execute drilldown and drill-sideways requests
Sample Code – Indexing
// Builds the taxonomy as documents are indexed, multi-threaded, single instance
TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
// Adds facets information to a document, can be initialized once per thread
FacetFields facetFields = new FacetFields(taxoWriter);
// List of categories to add to the document
List<CategoryPath> cats = new ArrayList<CategoryPath>();
cats.add(new CategoryPath("Author", "Erik Hatcher"));
cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’));
cats.add(new CategoryPath("Pub Date", "2004", "December", "1"));
Document bookDoc = new Document();
bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES);
// add categories fields (DocValues, Postings)
facetFields.addFields(bookDoc, cats);
// index the document
indexWriter.addDocument(bookDoc);
Sample Code – Search
// Open an NRT TaxonomyReader
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
// Define the facets to
FacetSearchParams fsp =
fsp.addFacetRequest(new
fsp.addFacetRequest(new

aggregate (top-10 categories for each)
new FacetSearchParams();
CountFacetRequest(new CategoryPath("Author"), 10));
CountFacetRequest(new CategoryPath("Pub Date"), 10));

// Collect both top-K facets and top-N matching documents
TopDocsCollector tdc = TopScoredDocCollector.create(10, true);
FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor);
Query q = new TermQuery(new Term(“title”, “lucene”));
searcher.search(q, MultiCollector.wrap(tdc, fc));
// Traverse the top facets
for (FacetResult fres : facetsCollector.getFacetResults()) {
FacetResultNode root = fres.getFacetResultNode();
System.out.println(String.format("%s (%d)", root.label, root.value));
for (FacetResultNode cat : root.getSubResults()) {
System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”);
}
}
Drilldown and Drill-Sideways
•

Drilldown adds a filter to the search
–

Multiple categories can be OR’d

// Drilldown – filter results to “Component/core/index”;
// All other “Component/*” and “Component/core/*” get count 0
Query base = new MatchAllDocsQuery();
DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base);
ddq.add(new CategoryPath(“Component/core/index”, ‘/’));

•

Drill sideways allows drilldown, yet still aggregate “sideways”
categories

// Drill-Sideways – drilldown on “Component/core/index”;
// Other “Component/*” and “Component/core/*” are counted too
DrillSideways ds = new DrillSideways(searcher, taxoReader);
DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp);
http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
Dynamic Facets
•

Range facets on NumericDocValues fields
–
–

Define interested buckets during query
Supports any arbitrary ValueSource (Lucene 4.6.0)

// Aggregate matching documents into buckets
RangeAccumulator a = new RangeAccumulator(new
RangeFacetRequest<LongRange>("field",
new LongRange(“1-5", 1L, true, 5L, true),
new LongRange(“6-20", 6L, true, 20L, true),
new LongRange(“21-100", 21L, false, 100L, false),
new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
Facet Associations
•

Not all facets created equal
–
–
–

•

Categories can have values associated with them per document
–
–

•

Categories added by an automatic categorization system, e.g. Category/Apache
Lucene (0.74) (confidence level is 0.74)
Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated
from contracts)
Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD,
numUpdates=8…)
They are later aggregated by these values
NOTE: ≠ NumericDocValuesFields!

Facet associations are completely customizable – encoded as a byte[] per
document

http://shaierera.blogspot.com/2013/01/facet-associations.html
More Features
•

Complements
–
–
–

•

Sampling
–
–

•

Holds the count of each category in-memory, per IndexReader
When number of search results is >50% of the index, count the “complement set”
Useful for “overview” queries, e.g. MatchAllDocsQuery
Aggregate a sampled set of the search results
Optionally re-count top-K facets for accurate values

Partitions
–
–

Partition the taxonomy space to control memory usage during faceted search
Useful for very big taxonomies (10s of millions of categories)
Lucene Facets Under the Hood
The Taxonomy Index
•

The taxonomy maps categories to integer codes (referred to as ordinals)
–
–
–

•

Kind of like a Map<CategoryPath,Integer>, with hierarchy support
Provides taxonomy browsing services
DirectoryTaxonomyWriter is managed as a sidecar Lucene index

Categories are broken down to their path components, e.g.
Date/2012/March/20 becomes:
–
–
–
–

Date, with ordinal=1
Date/2012, with ordinal=2
Date/2012/March, with ordinal=3
Date/2012/March/20, with ordinal=4
The Search Index
•

Categories are added as drilldown terms, e.g. for Date/2012/March/20:
–
–
–

•

$facets:Date
$facets:Date/2012
…

All category ordinals associated with the document are added as a
BinaryDocValuesField
–
–

All path components ordinals’ are added, not just the leafs’
Encoded as VInt + gap for efficient compression and speed
•

–

Other compression methods attempted, but were slower to decode (LUCENE-4609)

Used during faceted search to read all the associated ordinals and aggregate accordingly
(e.g. count)
SortedSet Facets
•
•
•
•

SortedSetFacetFields add SortedSetDocValuesFields and drilldown
terms to documents
Local-segment SortedSet ordinals are mapped to global ones through
SortedSetDocValuesReaderState
Use SortedSetDocValuesAccumulator to accumulate SortedSet facets
Advantages:
–
–
–

•

Taxonomy representation requires less RAM (flat taxonomy)
No sidecar index
Tie-breaks by label-sort order

Disadvantages:
–
–
–
–

Not full taxonomy
Overall uses more RAM (local-to-global ordinal mapping)
Adds NRT reopen cost
Slower than taxonomy-based facets
Global Ordinals
•

Per-segment integer codes (as used by the SortedSet approach) are less efficient
–
–
–

•

Global ordinals allow efficient per-segment faceting and aggregation
–
–

•

Different ordinals for same categories across segments
Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable
Resolve top-K on the String representation of categories – more CPU
No translation maps required (no extra RAM, highly scalable)
Aggregation, top-K computation done on integer codes

But, do not play well with IndexWriter.addIndexes(Directory…)
–

Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the
input search are mapped to the destination’s
Two-Phase Aggregation
•

FacetsCollector works in two steps:
–
–

•

Performance tests show that this improves faceted search (LUCENE-4600)
–

•

Collects matching documents (and optionally their scores)
Invokes FacetsAccumulator to accumulate the top-K facets
Locality of reference?

Useful for Sampling and Complements
–

Hard to do otherwise
FacetIndexingParams
•

Determine how facets are encoded
–
–
–

•

CategoryListParams holds parameters for a category list
–
–

•

Partition size
Facet delimiter character (for drilldown terms, default u001F)
CategoryListParams
Encoder/Decoder (default DGapVInt)
OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and
ALL_BUT_DIMENSION (default)

CategoryListParams can be used to group facets together
–
–

Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field)
Expert: separate categories by dimension into different category lists
•

•

Useful when sets of categories are always aggregated together, but not with other categories

FacetIndexingParams are currently not recorded per-segment and therefore you
should be careful if you suddenly change them!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in luceneLucidworks (Archived)
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Dynamo db tópico avançado - modelagem de dados e boas práticas para escalar
Dynamo db   tópico avançado - modelagem de dados e boas práticas para escalarDynamo db   tópico avançado - modelagem de dados e boas práticas para escalar
Dynamo db tópico avançado - modelagem de dados e boas práticas para escalarAmazon Web Services LATAM
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Oracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aasOracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aasKyle Hailey
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Sharding
ShardingSharding
ShardingMongoDB
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 

Was ist angesagt? (20)

Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in lucene
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Dynamo db tópico avançado - modelagem de dados e boas práticas para escalar
Dynamo db   tópico avançado - modelagem de dados e boas práticas para escalarDynamo db   tópico avançado - modelagem de dados e boas práticas para escalar
Dynamo db tópico avançado - modelagem de dados e boas práticas para escalar
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Oracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aasOracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aas
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Sharding
ShardingSharding
Sharding
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 

Andere mochten auch

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeleylucenerevolution
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsLucidworks
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldBrian McKeiver
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsLucidworks
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...Earley Information Science
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?epek
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web祺傑 林
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in LibrariesLaura Loveday Maury
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...Grokking VN
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemesNadeem Nazir
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 

Andere mochten auch (20)

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real World
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
 
Faceted Navigation
Faceted NavigationFaceted Navigation
Faceted Navigation
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 

Ähnlich wie Faceted Search with Lucene

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event PresentationChartio
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friendslucenerevolution
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsJulien Nioche
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Aaron Shilo
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 

Ähnlich wie Faceted Search with Lucene (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Faceted Search with Lucene

  • 1.
  • 2. Faceted Search with Lucene Shai Erera Researcher, IBM
  • 3. Who Am I • • • • Working at IBM – Information Retrieval Research Lucene/Solr committer and PMC member http://shaierera.blogspot.com shaie@apache.org
  • 5. Faceted Search • Technique for accessing documents that were classified into a taxonomy of categories – • Flat: Author/John Doe, Tags/Lucene, Popularity/High – Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP) Quick overview of the break down of the search results – • How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet Simplifies interaction with the search application – Drilldown to issues that were updated in Past 2 days by clicking a link – No knowledge required about search syntax and index schema http://jirasearch.mikemccandless.com
  • 6. Lucene Facets • • Contributed by IBM in 2011, released in 3.4.0 Major changes since 4.1.0+ – – – – • Two main indexing-time modes – – • Taxonomy-based: hierarchical facets, managed by a sidecar index, low NRT reopen cost SortedSetDocValues: flat facets only, no sidecar index, higher NRT reopen cost Runtime modes – • NRT support Nearly 400% search speedups Complete API revamp New features (SortedSet, range faceting, drill-sideways) Range facets (on NumericDocValues fields) Other implementations: Solr, ElasticSearch, Bobo Browse
  • 7. Lucene Facet Components • TaxonomyWriter/Reader – • FacetFields – • Defines which facets to aggregate and the FacetsAggregator (aggregation function) FacetsCollector – • Add facets information to documents (DocValues fields, drilldown terms) FacetRequest – • Manage the taxonomy information Collects matching documents and computes the top-K categories for each facet request (invokes FacetsAccumulator) DrillDownQuery / DrillSideways – Execute drilldown and drill-sideways requests
  • 8. Sample Code – Indexing // Builds the taxonomy as documents are indexed, multi-threaded, single instance TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir); // Adds facets information to a document, can be initialized once per thread FacetFields facetFields = new FacetFields(taxoWriter); // List of categories to add to the document List<CategoryPath> cats = new ArrayList<CategoryPath>(); cats.add(new CategoryPath("Author", "Erik Hatcher")); cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’)); cats.add(new CategoryPath("Pub Date", "2004", "December", "1")); Document bookDoc = new Document(); bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES); // add categories fields (DocValues, Postings) facetFields.addFields(bookDoc, cats); // index the document indexWriter.addDocument(bookDoc);
  • 9. Sample Code – Search // Open an NRT TaxonomyReader TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter); // Define the facets to FacetSearchParams fsp = fsp.addFacetRequest(new fsp.addFacetRequest(new aggregate (top-10 categories for each) new FacetSearchParams(); CountFacetRequest(new CategoryPath("Author"), 10)); CountFacetRequest(new CategoryPath("Pub Date"), 10)); // Collect both top-K facets and top-N matching documents TopDocsCollector tdc = TopScoredDocCollector.create(10, true); FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor); Query q = new TermQuery(new Term(“title”, “lucene”)); searcher.search(q, MultiCollector.wrap(tdc, fc)); // Traverse the top facets for (FacetResult fres : facetsCollector.getFacetResults()) { FacetResultNode root = fres.getFacetResultNode(); System.out.println(String.format("%s (%d)", root.label, root.value)); for (FacetResultNode cat : root.getSubResults()) { System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”); } }
  • 10. Drilldown and Drill-Sideways • Drilldown adds a filter to the search – Multiple categories can be OR’d // Drilldown – filter results to “Component/core/index”; // All other “Component/*” and “Component/core/*” get count 0 Query base = new MatchAllDocsQuery(); DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base); ddq.add(new CategoryPath(“Component/core/index”, ‘/’)); • Drill sideways allows drilldown, yet still aggregate “sideways” categories // Drill-Sideways – drilldown on “Component/core/index”; // Other “Component/*” and “Component/core/*” are counted too DrillSideways ds = new DrillSideways(searcher, taxoReader); DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp); http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
  • 11. Dynamic Facets • Range facets on NumericDocValues fields – – Define interested buckets during query Supports any arbitrary ValueSource (Lucene 4.6.0) // Aggregate matching documents into buckets RangeAccumulator a = new RangeAccumulator(new RangeFacetRequest<LongRange>("field", new LongRange(“1-5", 1L, true, 5L, true), new LongRange(“6-20", 6L, true, 20L, true), new LongRange(“21-100", 21L, false, 100L, false), new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
  • 12. Facet Associations • Not all facets created equal – – – • Categories can have values associated with them per document – – • Categories added by an automatic categorization system, e.g. Category/Apache Lucene (0.74) (confidence level is 0.74) Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated from contracts) Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD, numUpdates=8…) They are later aggregated by these values NOTE: ≠ NumericDocValuesFields! Facet associations are completely customizable – encoded as a byte[] per document http://shaierera.blogspot.com/2013/01/facet-associations.html
  • 13. More Features • Complements – – – • Sampling – – • Holds the count of each category in-memory, per IndexReader When number of search results is >50% of the index, count the “complement set” Useful for “overview” queries, e.g. MatchAllDocsQuery Aggregate a sampled set of the search results Optionally re-count top-K facets for accurate values Partitions – – Partition the taxonomy space to control memory usage during faceted search Useful for very big taxonomies (10s of millions of categories)
  • 15. The Taxonomy Index • The taxonomy maps categories to integer codes (referred to as ordinals) – – – • Kind of like a Map<CategoryPath,Integer>, with hierarchy support Provides taxonomy browsing services DirectoryTaxonomyWriter is managed as a sidecar Lucene index Categories are broken down to their path components, e.g. Date/2012/March/20 becomes: – – – – Date, with ordinal=1 Date/2012, with ordinal=2 Date/2012/March, with ordinal=3 Date/2012/March/20, with ordinal=4
  • 16. The Search Index • Categories are added as drilldown terms, e.g. for Date/2012/March/20: – – – • $facets:Date $facets:Date/2012 … All category ordinals associated with the document are added as a BinaryDocValuesField – – All path components ordinals’ are added, not just the leafs’ Encoded as VInt + gap for efficient compression and speed • – Other compression methods attempted, but were slower to decode (LUCENE-4609) Used during faceted search to read all the associated ordinals and aggregate accordingly (e.g. count)
  • 17. SortedSet Facets • • • • SortedSetFacetFields add SortedSetDocValuesFields and drilldown terms to documents Local-segment SortedSet ordinals are mapped to global ones through SortedSetDocValuesReaderState Use SortedSetDocValuesAccumulator to accumulate SortedSet facets Advantages: – – – • Taxonomy representation requires less RAM (flat taxonomy) No sidecar index Tie-breaks by label-sort order Disadvantages: – – – – Not full taxonomy Overall uses more RAM (local-to-global ordinal mapping) Adds NRT reopen cost Slower than taxonomy-based facets
  • 18. Global Ordinals • Per-segment integer codes (as used by the SortedSet approach) are less efficient – – – • Global ordinals allow efficient per-segment faceting and aggregation – – • Different ordinals for same categories across segments Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable Resolve top-K on the String representation of categories – more CPU No translation maps required (no extra RAM, highly scalable) Aggregation, top-K computation done on integer codes But, do not play well with IndexWriter.addIndexes(Directory…) – Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the input search are mapped to the destination’s
  • 19. Two-Phase Aggregation • FacetsCollector works in two steps: – – • Performance tests show that this improves faceted search (LUCENE-4600) – • Collects matching documents (and optionally their scores) Invokes FacetsAccumulator to accumulate the top-K facets Locality of reference? Useful for Sampling and Complements – Hard to do otherwise
  • 20. FacetIndexingParams • Determine how facets are encoded – – – • CategoryListParams holds parameters for a category list – – • Partition size Facet delimiter character (for drilldown terms, default u001F) CategoryListParams Encoder/Decoder (default DGapVInt) OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and ALL_BUT_DIMENSION (default) CategoryListParams can be used to group facets together – – Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field) Expert: separate categories by dimension into different category lists • • Useful when sets of categories are always aggregated together, but not with other categories FacetIndexingParams are currently not recorded per-segment and therefore you should be careful if you suddenly change them!