SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Martijn van Groningen
mvg@apache.org
@mvgroningen
Document relations
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Topics
• Background
• Parent / child support
• Nested support
• Future developments
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
Local join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• We need more capacity.
• But how to divide the relational data?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Q
uery
sub-queries
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
C
Query
sub-query
De-normalized document
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
Query
sub-query
C
local joinlocal join
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Background
• Dealing with relations either pay the price on
write time or read time.
• Alternatively documents relations can balance
the costs between read and write time.
For example: one join to reduce duplicated data.
• Supporting “many-to-many” joins in a
distributed system is difficult.
Either unbalanced partitions or very expensive join.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The query time join
Parent child
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• Parent / child is a query time join between
different document types in the same index.
• Parent and children documents are stored as
separate documents in the same index.
• Child documents can point to only one parent.
• Parent documents can be referred by multiple child documents.
• Also a parent document can be a child
document of a different parent.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child
• A parent document and its children
documents are routed into the same shard.
• Parent id is used as routing value.
• In combination with a parent ids in memory
data structure the parent-child join is fast.
• Use warmer api to preload it!
• Parent ids data structure size has significantly been reduced in
version 0.90.1
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Indexing
• The parent document doesn’t need to exist at
time of indexing.
curl -XPUT 'localhost:9200/products' -d '{
  "mappings" : {
     "offer" : {
        "_parent" : { "type" : "product" }
     }
  }
}'
A offer document
is a parent of a
product document
curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{
"valid_from" : "2013-05-01",
"valid_to" : "2013-10-01",
"price" : 26.87,
}'
Then when
indexing mention
to what product a
offer points to.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - Querying
• The has_child query returns parent
documents based on matches in its child
documents.
• The optional “score_mode” defines how child
hits are mapped to its parent document.
curl -XGET 'localhost:9200/products/_search' -d '{
"query" : {
      "has_child" : {
         "type" : "offer",
" "query" : {
            "range" : {
               "price" : {
"lte" : 50
               }
            }
       }
    }
  }
}'
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
The index time join
Nested objects
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
• In many cases domain models have the same
write / update live-cycle.
• Books & Chapters.
• Movies & Actors.
• De-normalizing results in the fastest queries.
• Compared to using parent/child queries.
• Nested objects allow smart de-normalization.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"authors" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapter_1_title" : "Introduction",
"chapter_1_summary" : "Short introduction about Elasticsearch’s features ...",
"chapter_1_number_of_pages" : 12,
"chapter_2_title" : "Data in, Data out",
"chapter_2_summary" : "How to manage your data with Elasticsearch ...",
"chapter_2_number_of_pages" : 39,
...
}
Too verbose!
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
"author" : "Clinton Gormley",
"categories" : ["programming", "information retrieval"],
"published_year" : 2013,
"summary" : "The definitive guide for Elasticsearch ...",
"chapters" : [
{
"title" : "Introduction",
"summary" : "Short introduction about Elasticsearch’s features ...",
"number_of_pages" : 12
},
{
"title" : "Data in, Data out",
"summary" : "How to manage your data with Elasticsearch ...",
"number_of_pages" : 39
},
...
]
}
• JSON allows complex nesting of objects.
• But how does this get indexed?
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
{
"title" : "Elasticsearch",
...
"chapters" : [
{"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12},
{"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39},
...
]
}
{
"title" : "Elasticsearch",
...
"chapters.title" : ["Data in, Data out", "Introduction"],
"chapters.summary" : ["How to ...", "Short ..."],
"chapters.number_of_pages" : [12, 39]
}
Original json document:
Lucene Document Structure:
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Mapping
• The nested type triggers Lucene’s block
indexing.
• Multiple levels of inner objects is possible.
curl -XPUT 'localhost:9200/books' -d '{
"mappings" : {
"book" : {
"properties" : {
"chapters" : {
"type" : "nested"
}
}
}
}
}'
Document type
Field type: ‘nested’
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Block indexing
{"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12},
{"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39},
...
{
"title" : "Elasticsearch",
...
}
Lucene Documents Structure:
• Inlining the inner objects as separate Lucene
documents right before the root document.
• The root document and its nested documents
always remain in the same block.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested query
• Nested query returns the complete “book” as
hit. (root document)
curl -XGET 'localhost:9200/books/book/_search' -d '{
  "query" : {
     "nested" : {
         "path" : "chapters",
         "score_mode" : "avg",
" "query" : {
            "match" : {
               "chapters.summary" : {
                  "query" : "indexing data"
               }
            }
         }" "
     }
  }
}'
Specify the
nested level.
Chapter level
query
score mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects
X X X X X
root documents bitset:
Nested Lucene document, that match with the inner query.
Aggregate nested scores and push to root document.
X Set bit, that represents a root document.
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
But first questions!
Extra slides
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Nested objects - Nested sorting
curl -XGET 'localhost:9200/books/book/_search' -d '{
 "query" : {
  "match" : {
"summary" : {
"query" : "guide"
}
}       
},
"sort" : [
{
"chapters.number_of_pages" : {
"sort_mode" : "avg",
"nested_filter" : {
"range" : {
"chapters.number_of_pages" : {"lte" : 15}
}
}
}
}
]
}'
Sort mode
Wednesday, June 5, 13
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Parent child - sorting
• Parent/child sorting isn’t possible at the
moment.
• But there is a “custom_score” query work around.
• Downsides:
• Forces to execute a script for each matching document.
• The child sort value is converted into a float value.
"has_child" : {
"type" : "offer",
"score_mode" : "avg",
"query" : {
"custom_score" : {
"query" : { ... },
"script" : "doc["price"].value"
}
}
}
Wednesday, June 5, 13

Weitere ähnliche Inhalte

Ähnlich wie Document relations - Berlin Buzzwords 2013

Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
martijnvg
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014
nabot
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Sematext Group, Inc.
 

Ähnlich wie Document relations - Berlin Buzzwords 2013 (20)

Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
JSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social WebJSON-LD: JSON for the Social Web
JSON-LD: JSON for the Social Web
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI Mpls
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDB
 
The googlization of search 2014
The googlization of search 2014The googlization of search 2014
The googlization of search 2014
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
Intro to Angular.JS Directives
Intro to Angular.JS DirectivesIntro to Angular.JS Directives
Intro to Angular.JS Directives
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
NoSQL Now 2013 Presentation
NoSQL Now 2013 PresentationNoSQL Now 2013 Presentation
NoSQL Now 2013 Presentation
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappe
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Document relations - Berlin Buzzwords 2013

  • 1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Martijn van Groningen mvg@apache.org @mvgroningen Document relations Wednesday, June 5, 13
  • 2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Topics • Background • Parent / child support • Nested support • Future developments Wednesday, June 5, 13
  • 3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query Local join Wednesday, June 5, 13
  • 5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • We need more capacity. • But how to divide the relational data? Wednesday, June 5, 13
  • 6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Q uery sub-queries Wednesday, June 5, 13
  • 7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background C Query sub-query De-normalized document Wednesday, June 5, 13
  • 8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Wednesday, June 5, 13
  • 9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background Query sub-query C local joinlocal join Wednesday, June 5, 13
  • 10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Background • Dealing with relations either pay the price on write time or read time. • Alternatively documents relations can balance the costs between read and write time. For example: one join to reduce duplicated data. • Supporting “many-to-many” joins in a distributed system is difficult. Either unbalanced partitions or very expensive join. Wednesday, June 5, 13
  • 11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The query time join Parent child Wednesday, June 5, 13
  • 12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • Parent / child is a query time join between different document types in the same index. • Parent and children documents are stored as separate documents in the same index. • Child documents can point to only one parent. • Parent documents can be referred by multiple child documents. • Also a parent document can be a child document of a different parent. Wednesday, June 5, 13
  • 13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child • A parent document and its children documents are routed into the same shard. • Parent id is used as routing value. • In combination with a parent ids in memory data structure the parent-child join is fast. • Use warmer api to preload it! • Parent ids data structure size has significantly been reduced in version 0.90.1 Wednesday, June 5, 13
  • 14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Indexing • The parent document doesn’t need to exist at time of indexing. curl -XPUT 'localhost:9200/products' -d '{   "mappings" : {      "offer" : {         "_parent" : { "type" : "product" }      }   } }' A offer document is a parent of a product document curl -XPUT 'localhost:9200/products/offer/12?parent=p2345' -d '{ "valid_from" : "2013-05-01", "valid_to" : "2013-10-01", "price" : 26.87, }' Then when indexing mention to what product a offer points to. Wednesday, June 5, 13
  • 15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - Querying • The has_child query returns parent documents based on matches in its child documents. • The optional “score_mode” defines how child hits are mapped to its parent document. curl -XGET 'localhost:9200/products/_search' -d '{ "query" : {       "has_child" : {          "type" : "offer", " "query" : {             "range" : {                "price" : { "lte" : 50                }             }        }     }   } }' Wednesday, June 5, 13
  • 16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited The index time join Nested objects Wednesday, June 5, 13
  • 17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects • In many cases domain models have the same write / update live-cycle. • Books & Chapters. • Movies & Actors. • De-normalizing results in the fastest queries. • Compared to using parent/child queries. • Nested objects allow smart de-normalization. Wednesday, June 5, 13
  • 18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Wednesday, June 5, 13
  • 19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "authors" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapter_1_title" : "Introduction", "chapter_1_summary" : "Short introduction about Elasticsearch’s features ...", "chapter_1_number_of_pages" : 12, "chapter_2_title" : "Data in, Data out", "chapter_2_summary" : "How to manage your data with Elasticsearch ...", "chapter_2_number_of_pages" : 39, ... } Too verbose! Wednesday, June 5, 13
  • 20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", "author" : "Clinton Gormley", "categories" : ["programming", "information retrieval"], "published_year" : 2013, "summary" : "The definitive guide for Elasticsearch ...", "chapters" : [ { "title" : "Introduction", "summary" : "Short introduction about Elasticsearch’s features ...", "number_of_pages" : 12 }, { "title" : "Data in, Data out", "summary" : "How to manage your data with Elasticsearch ...", "number_of_pages" : 39 }, ... ] } • JSON allows complex nesting of objects. • But how does this get indexed? Wednesday, June 5, 13
  • 21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects { "title" : "Elasticsearch", ... "chapters" : [ {"title" : "Introduction", "summary" : "Short ...", "number_of_pages" : 12}, {"title" : "Data in, ...", "summary" : "How to ...", "number_of_pages" : 39}, ... ] } { "title" : "Elasticsearch", ... "chapters.title" : ["Data in, Data out", "Introduction"], "chapters.summary" : ["How to ...", "Short ..."], "chapters.number_of_pages" : [12, 39] } Original json document: Lucene Document Structure: Wednesday, June 5, 13
  • 22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Mapping • The nested type triggers Lucene’s block indexing. • Multiple levels of inner objects is possible. curl -XPUT 'localhost:9200/books' -d '{ "mappings" : { "book" : { "properties" : { "chapters" : { "type" : "nested" } } } } }' Document type Field type: ‘nested’ Wednesday, June 5, 13
  • 23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Block indexing {"chapters.title" : "Into...", "chapters.summary" : "...", "chapters.number_of_pages" : 12}, {"chapters.title" : "Data...", "chapters.summary" : "...", "chapters.number_of_pages" : 39}, ... { "title" : "Elasticsearch", ... } Lucene Documents Structure: • Inlining the inner objects as separate Lucene documents right before the root document. • The root document and its nested documents always remain in the same block. Wednesday, June 5, 13
  • 24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested query • Nested query returns the complete “book” as hit. (root document) curl -XGET 'localhost:9200/books/book/_search' -d '{   "query" : {      "nested" : {          "path" : "chapters",          "score_mode" : "avg", " "query" : {             "match" : {                "chapters.summary" : {                   "query" : "indexing data"                }             }          }" "      }   } }' Specify the nested level. Chapter level query score mode Wednesday, June 5, 13
  • 25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects X X X X X root documents bitset: Nested Lucene document, that match with the inner query. Aggregate nested scores and push to root document. X Set bit, that represents a root document. Wednesday, June 5, 13
  • 26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited But first questions! Extra slides Wednesday, June 5, 13
  • 27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nested objects - Nested sorting curl -XGET 'localhost:9200/books/book/_search' -d '{  "query" : {   "match" : { "summary" : { "query" : "guide" } }        }, "sort" : [ { "chapters.number_of_pages" : { "sort_mode" : "avg", "nested_filter" : { "range" : { "chapters.number_of_pages" : {"lte" : 15} } } } } ] }' Sort mode Wednesday, June 5, 13
  • 28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Parent child - sorting • Parent/child sorting isn’t possible at the moment. • But there is a “custom_score” query work around. • Downsides: • Forces to execute a script for each matching document. • The child sort value is converted into a float value. "has_child" : { "type" : "offer", "score_mode" : "avg", "query" : { "custom_score" : { "query" : { ... }, "script" : "doc["price"].value" } } } Wednesday, June 5, 13