SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Nested Documents in Lucene High-performance support for parent/child document relations mark@searcharea.co.uk
Problem: The Lucene data model is based on Documents, Fields and Terms. However many real-world data structures cannot be properly represented when collapsed into a single Lucene document. Single Lucene document
Problem: “Cross-matching” When two or more data structures of the same type are jumbled up into a single Lucene field, matching logic becomes confused e.g. >1 qualification in a resume John Name John A1 in Maths A1, E1 Grade E1 in Science Subject Maths, Science ! False match for query: Grade:A1 AND Subject:Science
Unacceptable solution #1 One modeling approach is to store related items in the same field and use proximity operators in queries Name John A1 Maths
.E1 Science GradeAndSubject John Example query:  “GradeAndSubject:”A1 Science”~2 A1 in Maths E1 in Science ! Slow ! Not scalable with number of fields  ,[object Object]
 Proximity distances must grow.
 Only one choice of Analyzer for given field ,[object Object],[object Object]
Solution: Nested Document Queries Nested documents need to be queried using new NestedDocumentQuery class which understands document relationships John Name A1 E1 Grade Grade docType resume Subject Maths Subject Science New NestedDocumentQuery ,[object Object]
 Reports any matches as a match on the parent document not the child
 Super-fast evaluation of joins between child and parent
 Requires an indexed field to identify parent documents?
Solution: Example Query Find resume of person called “John” with A1 grade in Maths John Name E1 A1 resume Grade docType Grade Subject Science Subject Maths The NestedDocumentQuery wrapper simply translates the stream of reported matches from the child-level query criteria into matches on the parent for evaluation of all the parent-level logic
Solution: Join speed Unlike a database, the cost of a join (child to parent) is blisteringly fast 3) Find first prior set bit e.g. position #356,670 100000100000000100000001000000010000001000010000000001000000100000100001 2) Index directly into cached BitSet at position #356,675 1) Match reported on document #356,675 ParentQuery 4) Attribute match to doc #356,670 NestedDocumentQuery ChildQuery The BitSet for defining parents is obtained from a Filter and can be cached aggressively with minimal memory cost (one bit per document in the index)
Other advantages Parent-child document relationships can also be used to limit child results from any one parent (e.g. efficiently control the max number of pages returned from any one website) Nesting levels can be arbitrarily deep  Very powerful multi-child queries possible e.g. find people likely to know person X using resume’s employment histories (multiple employer names/urls and related date-ranges)

Weitere Àhnliche Inhalte

Was ist angesagt?

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]MongoDB
 
Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Umesh Prasad
 
Elasticsearch
ElasticsearchElasticsearch
ElasticsearchDivij Sehgal
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
I rods분석(20170313,01,êč€ì„ íƒœ)
I rods분석(20170313,01,êč€ì„ íƒœ)I rods분석(20170313,01,êč€ì„ íƒœ)
I rods분석(20170313,01,êč€ì„ íƒœ)Suntae Kim
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB InternalsNorberto Leite
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
ELK Stack - Kibanaæ“äœœćŻŠć‹™
ELK Stack - Kibanaæ“äœœćŻŠć‹™ELK Stack - Kibanaæ“äœœćŻŠć‹™
ELK Stack - Kibanaæ“äœœćŻŠć‹™Kedy Chang
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용흄배 씜
 
DSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIREDSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIRE4Science
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
RabbitMQ ì•Œì•„ëłŽêž°
RabbitMQ ì•Œì•„ëłŽêž°RabbitMQ ì•Œì•„ëłŽêž°
RabbitMQ ì•Œì•„ëłŽêž°frankradio
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceMark Ginnebaugh
 
Introducing Drools
Introducing DroolsIntroducing Drools
Introducing DroolsMario Fusco
 
Mongo DB 성늄씜적화 ì „ëž”
Mongo DB 성늄씜적화 ì „ëž”Mongo DB 성늄씜적화 ì „ëž”
Mongo DB 성늄씜적화 ì „ëž”Jin wook
 
[2018] MySQL 읎쀑화 진화Ʞ
[2018] MySQL 읎쀑화 진화Ʞ[2018] MySQL 읎쀑화 진화Ʞ
[2018] MySQL 읎쀑화 진화ꞰNHN FORWARD
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsDatabricks
 

Was ist angesagt? (20)

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]
Naver속도의, 속도에 의한, 속도넌 위한 ëȘœêł DB (넀읎ëȄ 컚텐잠êČ€ìƒ‰êłŒ ëȘœêł DB) [Naver]
 
Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr Consuming RealTime Signals in Solr
Consuming RealTime Signals in Solr
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
I rods분석(20170313,01,êč€ì„ íƒœ)
I rods분석(20170313,01,êč€ì„ íƒœ)I rods분석(20170313,01,êč€ì„ íƒœ)
I rods분석(20170313,01,êč€ì„ íƒœ)
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
ELK Stack - Kibanaæ“äœœćŻŠć‹™
ELK Stack - Kibanaæ“äœœćŻŠć‹™ELK Stack - Kibanaæ“äœœćŻŠć‹™
ELK Stack - Kibanaæ“äœœćŻŠć‹™
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
DSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIREDSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIRE
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
RabbitMQ ì•Œì•„ëłŽêž°
RabbitMQ ì•Œì•„ëłŽêž°RabbitMQ ì•Œì•„ëłŽêž°
RabbitMQ ì•Œì•„ëłŽêž°
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database Performance
 
Introducing Drools
Introducing DroolsIntroducing Drools
Introducing Drools
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
Mongo DB 성늄씜적화 ì „ëž”
Mongo DB 성늄씜적화 ì „ëž”Mongo DB 성늄씜적화 ì „ëž”
Mongo DB 성늄씜적화 ì „ëž”
 
[2018] MySQL 읎쀑화 진화Ʞ
[2018] MySQL 읎쀑화 진화Ʞ[2018] MySQL 읎쀑화 진화Ʞ
[2018] MySQL 읎쀑화 진화Ʞ
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 

Andere mochten auch

Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsLucidworks
 
Lucene KV-Store
Lucene KV-StoreLucene KV-Store
Lucene KV-StoreMark Harwood
 
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Lucidworks
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015NoSQLmatters
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Patterns for large scale search
Patterns for large scale searchPatterns for large scale search
Patterns for large scale searchMark Harwood
 
Lucene with Bloom filtered segments
Lucene with Bloom filtered segmentsLucene with Bloom filtered segments
Lucene with Bloom filtered segmentsMark Harwood
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsLucidworks
 
Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?lucenerevolution
 
Understanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal KucUnderstanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal Kuclucenerevolution
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
 

Andere mochten auch (14)

Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
Lucene KV-Store
Lucene KV-StoreLucene KV-Store
Lucene KV-Store
 
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
 
MaFI Meeting 2016 (slides)
MaFI Meeting 2016 (slides)MaFI Meeting 2016 (slides)
MaFI Meeting 2016 (slides)
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Patterns for large scale search
Patterns for large scale searchPatterns for large scale search
Patterns for large scale search
 
Lucene with Bloom filtered segments
Lucene with Bloom filtered segmentsLucene with Bloom filtered segments
Lucene with Bloom filtered segments
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?
 
Understanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal KucUnderstanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal Kuc
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 

Ähnlich wie Proposal for nested document support in Lucene

11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...Alexander Decker
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35Alexander Decker
 
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...Editor IJCATR
 
The Duet model
The Duet modelThe Duet model
The Duet modelBhaskar Mitra
 
HyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringHyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringJinho Choi
 
Data models and ro
Data models and roData models and ro
Data models and roDiana Diana
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
Equation 2.doc
Equation 2.docEquation 2.doc
Equation 2.docbutest
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollSemantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollCROKODIl consortium
 
Introduction to Data Management Powerpoint
Introduction to Data Management PowerpointIntroduction to Data Management Powerpoint
Introduction to Data Management Powerpointichanismo
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597IJERA Editor
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...AI Publications
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsThomas Lee
 

Ähnlich wie Proposal for nested document support in Lucene (20)

11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35
 
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...
The Statement of Conjunctive and Disjunctive Queries in Object Oriented Datab...
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
HyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringHyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-Answering
 
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
 
Data models and ro
Data models and roData models and ro
Data models and ro
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Equation 2.doc
Equation 2.docEquation 2.doc
Equation 2.doc
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp SchollSemantic Relatedness of Web Resources by XESA - Philipp Scholl
Semantic Relatedness of Web Resources by XESA - Philipp Scholl
 
B01DataMgt.ppt
B01DataMgt.pptB01DataMgt.ppt
B01DataMgt.ppt
 
Introduction to Data Management Powerpoint
Introduction to Data Management PowerpointIntroduction to Data Management Powerpoint
Introduction to Data Management Powerpoint
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic Datasets
 

KĂŒrzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

KĂŒrzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Proposal for nested document support in Lucene

  • 1. Nested Documents in Lucene High-performance support for parent/child document relations mark@searcharea.co.uk
  • 2. Problem: The Lucene data model is based on Documents, Fields and Terms. However many real-world data structures cannot be properly represented when collapsed into a single Lucene document. Single Lucene document
  • 3. Problem: “Cross-matching” When two or more data structures of the same type are jumbled up into a single Lucene field, matching logic becomes confused e.g. >1 qualification in a resume John Name John A1 in Maths A1, E1 Grade E1 in Science Subject Maths, Science ! False match for query: Grade:A1 AND Subject:Science
  • 4.
  • 6.
  • 7.
  • 8. Reports any matches as a match on the parent document not the child
  • 9. Super-fast evaluation of joins between child and parent
  • 10. Requires an indexed field to identify parent documents?
  • 11. Solution: Example Query Find resume of person called “John” with A1 grade in Maths John Name E1 A1 resume Grade docType Grade Subject Science Subject Maths The NestedDocumentQuery wrapper simply translates the stream of reported matches from the child-level query criteria into matches on the parent for evaluation of all the parent-level logic
  • 12. Solution: Join speed Unlike a database, the cost of a join (child to parent) is blisteringly fast 3) Find first prior set bit e.g. position #356,670 100000100000000100000001000000010000001000010000000001000000100000100001 2) Index directly into cached BitSet at position #356,675 1) Match reported on document #356,675 ParentQuery 4) Attribute match to doc #356,670 NestedDocumentQuery ChildQuery The BitSet for defining parents is obtained from a Filter and can be cached aggressively with minimal memory cost (one bit per document in the index)
  • 13. Other advantages Parent-child document relationships can also be used to limit child results from any one parent (e.g. efficiently control the max number of pages returned from any one website) Nesting levels can be arbitrarily deep Very powerful multi-child queries possible e.g. find people likely to know person X using resume’s employment histories (multiple employer names/urls and related date-ranges)
  • 14. “Lucene is not a database”, but
.. Structure matters Many data sources are a mix of structured and unstructured content (e.g. microformats). This is unlikely to change. Lucene has historically been about unstructured text but has steadily been adding structured capability (Trie, spatial, facets) and become a great solution for hybrid data. However support for modeling and querying non-trivial data structures is missing currently. Relationships matter This proposal is not to recreate the full capabilities of a SQL database with arbitrary relationships. However we can benefit greatly from providing simple parent-child relationships We have some unique capabilities Parent-child joins are very fast Unlike SQL we can return partial, relevance-ranked matches Probably more akin to XML databases than SQL databases
  • 15. Next steps Existing code/unit tests can be released to Lucene project if there is sufficient interest. This software has been deployed in production on large datasets. The matching approach is reliant on parents and children being held in the same Lucene index segment. Additional control is needed to enforce this more rigorously - either by Adding more user-control over IndexWritersegment creation where applications understand/control parent-child dependencies OR Making Lucene aware of parent-child relationships e.g. new method Document.add(Document) Query parser support XML Query Parser support is available End-user Query parser could add new syntax e.g. +candidateLocale:UK +child(grade:A1 AND subject:music)
  • 16. Thoughts? Feedback encouraged on dev@lucene.apache.org