SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Document relations
                              Martijn van Groningen
                                 @mvgroningen




Tuesday, November 6, 12
Overview
     • Background
     • Document relations with joining.
     • Various solutions in Lucene and Elasticsearch



Tuesday, November 6, 12
Background - Lucene model
     • Lucene is document based.
     • Lucene doesn’t store information about relations
       between documents.


     • Data often holds relations.
     • Good free text search over relational data.

Tuesday, November 6, 12
Background - Common solutions
     • Compound documents.
           •     May result in documents with many fields.

     • Subsequent searches.
           •     May cause a lot of network overhead.




     • Non Lucene based approach:
           •     Use Lucene in combination with a relational database.




Tuesday, November 6, 12
Background - Example
     • Product
           •     Name
           •     Description

     • Product-item
           •     Color
           •     Size
           •     Price




Tuesday, November 6, 12
Background - Example
     • Compound Product & Product-items document.
     • Each product-item has its own field prefix.




Tuesday, November 6, 12
Background - Other solutions
     • Lucene offers solutions to have a 'relational' like
       search.
           •     Joining
           •     Result grouping


     • Elasticsearch builds on top of the joining
       capabilities.


     • These solutions aren't naturally supported.

Tuesday, November 6, 12
Joining



Tuesday, November 6, 12
Joining
     • Join support available since Lucene 3.4
           •     Not a SQL join!


     • Two distinct joining types:
           •     Index time join
           •     Query time join


     • Joining provides a solution to handle document
       relations.


Tuesday, November 6, 12
Joining - What is out there?
     • Index time:
           •     Lucene’s block join implementation.
           •     Elasticsearch’s nested filter, query and facets.
                •         Built on top of the Lucene’s block join support.


     • Query time:
           •     Lucene’s query time join utility.
           •     Solr’s join query parser.
           •     Elasticsearch’s various parent-child queries and filters.



Tuesday, November 6, 12
Index time join
                            And nested documents.




Tuesday, November 6, 12
Joining - Block join query
     • Lucene block join queries:
           •     ToParentBlockJoinQuery
           •     ToChildBlockJoinQuery


     • Lucene collector:
           •     ToParentBlockJoinCollector


     • Index time join requires block indexing.

Tuesday, November 6, 12
Joining - Block indexing
     • Atomically adding documents.
           •     A block of documents.


     • Each document gets sequentially assigned Lucene
       document id.


     • IndexWriter#addDocuments(docs);


Tuesday, November 6, 12
Joining - Block indexing
     • Index doesn't record blocks.
     • App is responsible for identifying block documents.
     • Segment merging doesn’t re-order documents in a
       segment.


     • Adding athe whole block.block requires you to
       reindex
                document to a



Tuesday, November 6, 12
Joining - block join query

      • Parent is the last document in a block.




Tuesday, November 6, 12
Block join - ToChildBlockJoinQuery

                          Marking parent documents




Tuesday, November 6, 12
Block join - ToChildBlockJoinQuery



                                  Add block




                                   Add block




Tuesday, November 6, 12
Block join - ToChildBlockJoinQuery
     • Parent filter marks the parent documents.

     • Child query is executed in the parent space.




Tuesday, November 6, 12
Block join & Elasticsearch
     • In Elasticsearch exposed as nested objects.
     • Documents are constructed as JSON.
           •     JSON’s nested structure works nicely with block
                 indexing.


     • Elasticsearchoftakes nested documents. and also
       keeps track the
                            care of block indexing




Tuesday, November 6, 12
Elasticsearch’s nested support
     • Support for a nested type in mappings.
     • Nested query.
     • Nested filter.
     • Nested facets.


Tuesday, November 6, 12
Nested type
     • The nested types enables Lucene’s block indexing.
                                           index



                          curl -XPUT 'localhost:9200/products' -d '{
     type                    "mappings" : {
                               "product" : {
                                  "properties" : {
Nested offers                       "offers" : { "type" : "nested" }
                                  }
                               }
                             }
                          }'




Tuesday, November 6, 12
Indexing nested objects
                                                 index         type

                          curl -XPOST 'localhost:9200/products/product' -d '{
                             "name" : "Polo shirt",
                             "description" : "Made of 100% cotton",
 nested objects              "offers" : [
                                  {
                                     "color" : "red",
                                     "size" : "s",
                                     "price" : 999
                                  },
                                  {
                                     "color" : "red",
                                     "size" : "m",
                                     "price" : 1099
                                  },
                                  {
                                     "color" : "blue",
                                     "size" : "s",
                                     "price" : 999
                                  }
                             ]
                          }'




Tuesday, November 6, 12
Nested query
                                                                   The nested field
     curl -XPOST 'localhost:9200/products/product/_search' -d '{
        "query" : {
                                                                   path in mapping.
          "nested" : {
              "path" : "offers",
              "score_mode" : "total",                               Sum the individual
              "query" : {                                            nested matches.
                  "bool" : {
                       "must" : [
                           {
                               "term" : {
                                   "color" : "blue"
                               }
                           },
                                                                    Color red would match
                           {                                        the previous document.
                               "term" : {
                                   "size" : "m"
                               }
                           }
                       ]
                  }
              }
          }
        }
     }'




Tuesday, November 6, 12
Nested facets
     curl -XPOST 'localhost:9200/products/product/_search' -d '{
        "facets" : {
          "color" : {
             "terms_stats" : {
                "key_field" : "size",
                "value_field" : "price"
             },
             "nested" : "offers"
          }                                              "facets":{
        }                                                    "color":{
     }'                                                          "_type":"terms_stats",
                                                                 "missing":0,
                                                                 "terms":[
                                                                     {
        A facet for nested field                                          "term":"s",
                                                                         "count":2,
                 offers.                                                 "total_count":2,
                                                                         "min":999.0,
                                                                         "max":999.0,
                                                                         "total":1998.0,
                                                                         "mean":999.0
                                                                     },
                                                                     ...
                        Counts 2 nested documents                ]
                                 for term: s                 }
                                                         }




Tuesday, November 6, 12
Query time join
                           and parent & child relations.




Tuesday, November 6, 12
Query time joining
     • Documents are joined during query time.
           •     More expensive, but more flexible.


     • Two types of query time joins:
           •     Parent child joining.
           •     Field based joining.




Tuesday, November 6, 12
Lucene’s query time join
     • Query time joining is executed in two phases.
     • Field based joining:
           •     ‘from’ field
           •     ‘to’ field




     • Doesn’t require block indexing.
Tuesday, November 6, 12
Query time join - JoinUtil
     • Firstthe documents thatthe terms in the fromField
       for
              phase collects all
                                 match with the original
            query.


     • The second the collected termsdocumentsprevious
       match with
                   phase returns the
                                      from the
                                               that
            phase in the toField.


     • One public method:
           •     JoinUtil#createJoinQuery(...)



Tuesday, November 6, 12
Joining - JoinUtil




                          Referrer the product id.



Tuesday, November 6, 12
Joining - JoinUtil




Tuesday, November 6, 12
Joining - JoinUtil



                                           Join utility




 • Result will contain one products.
 • Possible to do ‘join’ across indices.

Tuesday, November 6, 12
Elasticsearch’s query time join
     • A parent child solution.
     • Not related to Lucene’s query time join.
     • Support consists out of:
           •     The _parent field.
           •     The top_children query.
           •     The has_parent & has_child filter & query.
           •     Scoped facets.


Tuesday, November 6, 12
The _parent field
     • Points to the parent type.
     • Mapping attribute to be define on the child type.
                           curl -XPUT 'localhost:9200/products' -d '{
                              "mappings" : {
                                "offer" : {
                                   "_parent" : {
                                     "type" : "product"
                                   }
                                }
                              }
                           }'



     • Elasticsearch uses the _parent field to build an id
       cache.
           •       Makes parent/child queries & filters fast.



Tuesday, November 6, 12
Indexing parent & child documents
     • Parent document:
              curl -XPOST 'localhost:9200/products/product/1' -d '{
                 "name" : "Polo shirt",
                 "description" : "Made of 100% cotton"
              }'                                                            The id of the parent
                                                                          document. Also used for
     • Child documents:                                                           routing.

             curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{
                "color" : "red",
                "size" : "s",
                "price" : 999
             }'



             curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{
                "color" : "blue",
                          "red",
                "size" : "s",
                         "m",
                "price" : 999
                          1099
             }'




Tuesday, November 6, 12
The ‘top_children’ query
      curl -XPOST 'localhost:9200/products/_search' -d '{
         "query" : {
                                                            Child type
           "top_children" : {
              "type" : "offer",
              "query" : {
                 "term" : {                                 Child query
                   "size" : "m"
                 }
              },
              "score" : "sum"                               Score mode
           }
         }
      }'




     • Internally the in order to getpotentially executed
       several times
                      child query is
                                      enough parent hits.



Tuesday, November 6, 12
The ‘has_child’ query
      curl -XPOST 'localhost:9200/products/_search' -d '{   Child type
         "query" : {
           "has_child" : {
              "type" : "offer",
              "query" : {
                "term" : {
                   "size" : "m"
                                                            Child query
                }
              }
           }
         }
      }'




     • Doesn’tdoc. Works as a filter. into the matching
       parent
               map the child scores



     • The has_parent query matches child document
       instead.

Tuesday, November 6, 12
Scoped facets
     curl -XPOST 'localhost:9200/products/_search' -d '{
        "query" : {
           "has_child" : {
             "type" : "offer",
             "query" : {
                "term" : {
                  "size" : "m"
                }
             },
             "_scope" : "my_scope"
           }                                               Execute facets inside
        },
        "facets" : {
                                                             a specific scope.
           "color" : {
             "terms_stats" : {
                "key_field" : "size",
                "value_field" : "price"
             },
             "scope" : "my_scope"
           }
        }
     }'




Tuesday, November 6, 12
Conclusion
     • Block join & nested object are fast and efficient,
       but lack flexibility.


     • Query time and parent child join are flexible at the
       cost of performance and memory.
           •     Field based query time joining is the most flexible.
           •     Parent child based joining is the fastest.


     • Faceting in combination with document relations
       gives a nice analytical view.


Tuesday, November 6, 12
Any questions?




Tuesday, November 6, 12

Weitere ähnliche Inhalte

Was ist angesagt?

Applied-Economics-1.pptx
Applied-Economics-1.pptxApplied-Economics-1.pptx
Applied-Economics-1.pptxJasMae2
 
Oligopoly presentation
Oligopoly presentationOligopoly presentation
Oligopoly presentationzuha handoo
 
Fruit Magic Marketing Plan by Team Fruitilicious
Fruit Magic Marketing Plan by Team FruitiliciousFruit Magic Marketing Plan by Team Fruitilicious
Fruit Magic Marketing Plan by Team Fruitiliciousfiecasivy
 
Business Plan Sample - Edit for reddit
Business Plan Sample - Edit for redditBusiness Plan Sample - Edit for reddit
Business Plan Sample - Edit for redditMarcus Aurelius
 

Was ist angesagt? (12)

Applied-Economics-1.pptx
Applied-Economics-1.pptxApplied-Economics-1.pptx
Applied-Economics-1.pptx
 
Business Plan
Business PlanBusiness Plan
Business Plan
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
feasibility study Chapter 4
feasibility study Chapter 4feasibility study Chapter 4
feasibility study Chapter 4
 
Bussiness plan
Bussiness planBussiness plan
Bussiness plan
 
Oligopoly presentation
Oligopoly presentationOligopoly presentation
Oligopoly presentation
 
Fruit Magic Marketing Plan by Team Fruitilicious
Fruit Magic Marketing Plan by Team FruitiliciousFruit Magic Marketing Plan by Team Fruitilicious
Fruit Magic Marketing Plan by Team Fruitilicious
 
REFLECTION 1.docx
REFLECTION 1.docxREFLECTION 1.docx
REFLECTION 1.docx
 
Job search_resume
Job search_resumeJob search_resume
Job search_resume
 
Business Plan Sample - Edit for reddit
Business Plan Sample - Edit for redditBusiness Plan Sample - Edit for reddit
Business Plan Sample - Edit for reddit
 
Applied economics
Applied economicsApplied economics
Applied economics
 
Marketing plan PDF report
Marketing plan PDF reportMarketing plan PDF report
Marketing plan PDF report
 

Andere mochten auch

Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsLucidworks
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchBeyondTrees
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 

Andere mochten auch (6)

Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearch
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 

Ähnlich wie Document relations

elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"MongoDB
 
MVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertyMVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertycsmyth501
 
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data LibertyMVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data Libertycsmyth501
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_bostonMongoDB
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshopMathieu Elie
 
MongoSV Schema Workshop
MongoSV Schema WorkshopMongoSV Schema Workshop
MongoSV Schema WorkshopMongoDB
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHPBradley Holt
 
CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009Jason Davies
 
Schema & Design
Schema & DesignSchema & Design
Schema & DesignMongoDB
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesRyan CrawCour
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2won min jang
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesDaniel Coupal
 

Ähnlich wie Document relations (20)

Mongo db
Mongo dbMongo db
Mongo db
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
MVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertyMVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data liberty
 
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data LibertyMVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_boston
 
Schema Design
Schema DesignSchema Design
Schema Design
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
MongoSV Schema Workshop
MongoSV Schema WorkshopMongoSV Schema Workshop
MongoSV Schema Workshop
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009
 
Schema & Design
Schema & DesignSchema & Design
Schema & Design
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databases
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented Databases
 

Document relations

  • 1. Document relations Martijn van Groningen @mvgroningen Tuesday, November 6, 12
  • 2. Overview • Background • Document relations with joining. • Various solutions in Lucene and Elasticsearch Tuesday, November 6, 12
  • 3. Background - Lucene model • Lucene is document based. • Lucene doesn’t store information about relations between documents. • Data often holds relations. • Good free text search over relational data. Tuesday, November 6, 12
  • 4. Background - Common solutions • Compound documents. • May result in documents with many fields. • Subsequent searches. • May cause a lot of network overhead. • Non Lucene based approach: • Use Lucene in combination with a relational database. Tuesday, November 6, 12
  • 5. Background - Example • Product • Name • Description • Product-item • Color • Size • Price Tuesday, November 6, 12
  • 6. Background - Example • Compound Product & Product-items document. • Each product-item has its own field prefix. Tuesday, November 6, 12
  • 7. Background - Other solutions • Lucene offers solutions to have a 'relational' like search. • Joining • Result grouping • Elasticsearch builds on top of the joining capabilities. • These solutions aren't naturally supported. Tuesday, November 6, 12
  • 9. Joining • Join support available since Lucene 3.4 • Not a SQL join! • Two distinct joining types: • Index time join • Query time join • Joining provides a solution to handle document relations. Tuesday, November 6, 12
  • 10. Joining - What is out there? • Index time: • Lucene’s block join implementation. • Elasticsearch’s nested filter, query and facets. • Built on top of the Lucene’s block join support. • Query time: • Lucene’s query time join utility. • Solr’s join query parser. • Elasticsearch’s various parent-child queries and filters. Tuesday, November 6, 12
  • 11. Index time join And nested documents. Tuesday, November 6, 12
  • 12. Joining - Block join query • Lucene block join queries: • ToParentBlockJoinQuery • ToChildBlockJoinQuery • Lucene collector: • ToParentBlockJoinCollector • Index time join requires block indexing. Tuesday, November 6, 12
  • 13. Joining - Block indexing • Atomically adding documents. • A block of documents. • Each document gets sequentially assigned Lucene document id. • IndexWriter#addDocuments(docs); Tuesday, November 6, 12
  • 14. Joining - Block indexing • Index doesn't record blocks. • App is responsible for identifying block documents. • Segment merging doesn’t re-order documents in a segment. • Adding athe whole block.block requires you to reindex document to a Tuesday, November 6, 12
  • 15. Joining - block join query • Parent is the last document in a block. Tuesday, November 6, 12
  • 16. Block join - ToChildBlockJoinQuery Marking parent documents Tuesday, November 6, 12
  • 17. Block join - ToChildBlockJoinQuery Add block Add block Tuesday, November 6, 12
  • 18. Block join - ToChildBlockJoinQuery • Parent filter marks the parent documents. • Child query is executed in the parent space. Tuesday, November 6, 12
  • 19. Block join & Elasticsearch • In Elasticsearch exposed as nested objects. • Documents are constructed as JSON. • JSON’s nested structure works nicely with block indexing. • Elasticsearchoftakes nested documents. and also keeps track the care of block indexing Tuesday, November 6, 12
  • 20. Elasticsearch’s nested support • Support for a nested type in mappings. • Nested query. • Nested filter. • Nested facets. Tuesday, November 6, 12
  • 21. Nested type • The nested types enables Lucene’s block indexing. index curl -XPUT 'localhost:9200/products' -d '{ type "mappings" : { "product" : { "properties" : { Nested offers "offers" : { "type" : "nested" } } } } }' Tuesday, November 6, 12
  • 22. Indexing nested objects index type curl -XPOST 'localhost:9200/products/product' -d '{ "name" : "Polo shirt", "description" : "Made of 100% cotton", nested objects "offers" : [ { "color" : "red", "size" : "s", "price" : 999 }, { "color" : "red", "size" : "m", "price" : 1099 }, { "color" : "blue", "size" : "s", "price" : 999 } ] }' Tuesday, November 6, 12
  • 23. Nested query The nested field curl -XPOST 'localhost:9200/products/product/_search' -d '{ "query" : { path in mapping. "nested" : { "path" : "offers", "score_mode" : "total", Sum the individual "query" : { nested matches. "bool" : { "must" : [ { "term" : { "color" : "blue" } }, Color red would match { the previous document. "term" : { "size" : "m" } } ] } } } } }' Tuesday, November 6, 12
  • 24. Nested facets curl -XPOST 'localhost:9200/products/product/_search' -d '{ "facets" : { "color" : { "terms_stats" : { "key_field" : "size", "value_field" : "price" }, "nested" : "offers" } "facets":{ } "color":{ }' "_type":"terms_stats", "missing":0, "terms":[ { A facet for nested field "term":"s", "count":2, offers. "total_count":2, "min":999.0, "max":999.0, "total":1998.0, "mean":999.0 }, ... Counts 2 nested documents ] for term: s } } Tuesday, November 6, 12
  • 25. Query time join and parent & child relations. Tuesday, November 6, 12
  • 26. Query time joining • Documents are joined during query time. • More expensive, but more flexible. • Two types of query time joins: • Parent child joining. • Field based joining. Tuesday, November 6, 12
  • 27. Lucene’s query time join • Query time joining is executed in two phases. • Field based joining: • ‘from’ field • ‘to’ field • Doesn’t require block indexing. Tuesday, November 6, 12
  • 28. Query time join - JoinUtil • Firstthe documents thatthe terms in the fromField for phase collects all match with the original query. • The second the collected termsdocumentsprevious match with phase returns the from the that phase in the toField. • One public method: • JoinUtil#createJoinQuery(...) Tuesday, November 6, 12
  • 29. Joining - JoinUtil Referrer the product id. Tuesday, November 6, 12
  • 30. Joining - JoinUtil Tuesday, November 6, 12
  • 31. Joining - JoinUtil Join utility • Result will contain one products. • Possible to do ‘join’ across indices. Tuesday, November 6, 12
  • 32. Elasticsearch’s query time join • A parent child solution. • Not related to Lucene’s query time join. • Support consists out of: • The _parent field. • The top_children query. • The has_parent & has_child filter & query. • Scoped facets. Tuesday, November 6, 12
  • 33. The _parent field • Points to the parent type. • Mapping attribute to be define on the child type. curl -XPUT 'localhost:9200/products' -d '{ "mappings" : { "offer" : { "_parent" : { "type" : "product" } } } }' • Elasticsearch uses the _parent field to build an id cache. • Makes parent/child queries & filters fast. Tuesday, November 6, 12
  • 34. Indexing parent & child documents • Parent document: curl -XPOST 'localhost:9200/products/product/1' -d '{ "name" : "Polo shirt", "description" : "Made of 100% cotton" }' The id of the parent document. Also used for • Child documents: routing. curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{ "color" : "red", "size" : "s", "price" : 999 }' curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{ "color" : "blue", "red", "size" : "s", "m", "price" : 999 1099 }' Tuesday, November 6, 12
  • 35. The ‘top_children’ query curl -XPOST 'localhost:9200/products/_search' -d '{ "query" : { Child type "top_children" : { "type" : "offer", "query" : { "term" : { Child query "size" : "m" } }, "score" : "sum" Score mode } } }' • Internally the in order to getpotentially executed several times child query is enough parent hits. Tuesday, November 6, 12
  • 36. The ‘has_child’ query curl -XPOST 'localhost:9200/products/_search' -d '{ Child type "query" : { "has_child" : { "type" : "offer", "query" : { "term" : { "size" : "m" Child query } } } } }' • Doesn’tdoc. Works as a filter. into the matching parent map the child scores • The has_parent query matches child document instead. Tuesday, November 6, 12
  • 37. Scoped facets curl -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "term" : { "size" : "m" } }, "_scope" : "my_scope" } Execute facets inside }, "facets" : { a specific scope. "color" : { "terms_stats" : { "key_field" : "size", "value_field" : "price" }, "scope" : "my_scope" } } }' Tuesday, November 6, 12
  • 38. Conclusion • Block join & nested object are fast and efficient, but lack flexibility. • Query time and parent child join are flexible at the cost of performance and memory. • Field based query time joining is the most flexible. • Parent child based joining is the fastest. • Faceting in combination with document relations gives a nice analytical view. Tuesday, November 6, 12