SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Elasticsearch Meetup | Amsterdam | April 7, 2016
Maarten Roosendaal & Anne Veling
introduction
• Anne Veling
– Elasticsearch consultancy and custom
training
– Performance and Stability Troubleshooting
– Software Architect, Team Lead
• Hierarchical data model, multiple levels
• High volume
– searches
– data changes
• Complex query requirements
– Both Product and Offer fields in query
– Facet on both levels
bol.com challenge
Products and Offers
faster indexing
faster searching
Test Data Creation
• Node.js Script creating random data
– Product
• Title: two random nouns from noun list
• Category: pick one out 26 nouns
• Half have no offer, half between 1-4
– Offer
• Random price between 1-20
• Seller: pick one out of 10k
• Stream in memory, flush out to disk in 3 flavors
– Each flavor keeping its own bulk size of 100k
– For 1M, 10M and 100M products
Document
{
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1,
"product": {
"id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime"
}
}
Nested
Nested
{
"_id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime",
"offers": [
{
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1
}
]
}
Parent/Child
{
"_id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime”
}
{
"_parent": "product95826"
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1
}
• Zipped data files
– 1M: 86Mb
– 10M: 860Mb
– 100M: 8.6Gg
Getting it there
Indexing?
Indexing
• 1M product set, local naive
– 80s Document
– 41s Nested
– 64s Parent/Child
• ES index bottleneck:
– Your source system and latency
it can slurp it up faster than you can serve it
Let’s take a break
Use Cases
Use Case A Use Case B Use Case C
Product Search Word in Title Word in Title
∃ DeliveryC = 0
Word in Title
∃ Price < P
Order By Relevance Relevance (Lowest) Price
Display for top N
products
Product Fields
Cheapest Offer
fields
Product Fields
Correct Cheapest
Offer fields
Product Fields
Cheapest Offer
fields
Aggregate On Category Category Category
∀ Offer SellerId ∀ Correct Offers
SellerId
∀ Correct Offers
SellerId
∀ Offer Price ∀ Correct Offers
Price
∀ Correct Offers
Price
∀ Offer
DeliveryCode
∀ Correct Offers
DeliveryCode
∀ Correct Offers
DeliveryCode
• Product
• Offer
Use Cases
D: query B, roll up by family
• Families (with products with
offers)
– with product.title:lunchroom
– filter by
product.offer.deliveryCode:tom
orrow
Searching for a lunchroom
How hard can it be?
Let’s search
POST /boltest1m_doc/_search -> 3046
{
"query": {
"term": {
"product.title": {
"value": "lunchroom"
}
}
}
}
POST /boltest1m_nested/_search -> 2026
{
"query": {
"term": {
"title": {
"value": "lunchroom"
}
}
}
}
POST /boltest1m_parentchild/_search -> 2022
{
"query": {
"has_parent": {
"parent_type": "product",
"query": {
"term": {
"title": {
"value": "lunchroom"
}
}
}
}
}
}
ElasticSearch docs (and Lucene docs)
Product with Doc Nested
Parent/
Child
no offer 1 1 (1) 1
1 offer 1 1 (2) 2
2 offers 2 1 (3) 3
Real Queries
• Add Details, Sorting
• Product Facets
– Category
• Offer Facets
– Seller ID
– Price Buckets
– Delivery Code
Compare the numbers…
Explain the differences...
A: Doc
A: Nested
A: Parent/Child
Query Tips
• Use aggregations
– Cardinality
– top_hits ♥️ (with top_score)
• Smart Grouping & Field Collapsing
• Slooooow 😢
– inner_hits
• Don’t forget post-filtering or result page
lookup
Ice Cream Bounty
for making top_hits aggregation fast
Testing
Results
0
20
40
60
80
100
120
140
160
180
200
a b c d
1m tun 30102015 32 GB new queries
doc
nested
parentchild
0
500
1000
1500
2000
2500
3000
3500
a b c d
10m tun 30102015 32 GB new queries
doc
nested
parentchild
Conclusions
• Parent/Child has limitations
– Combining cross-level queries with
aggregations in one go
• Doc not as fast as we’d expected
– Because we needed top_hits aggregation
• Elasticsearch scales predictably
Conclusions
• For us, nested was the best solution
• What is yours?
• What are you searching for?
– What are the rows?
– What are the facets about?
Lessons Learned
• Testing the scalability of your data model
– Fast iterations early on
– Valuable insight in indexing and search
requirements
• Data Modeling is hard
– Do it early
– Make it fun
Tech Lessons Learned
• Don’t forget to tune the ES cluster
– Configure memory ;)
• If bulk file last line has no n, gets ignored!
– count the differences
• 100k bulk files with .000 suffixes ought to
be enough for everyone, right?
• Do not underestimate Sneakernet
Thank You
@anneveling anne@beyondtrees.com

Weitere ähnliche Inhalte

Was ist angesagt?

Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...DataWorks Summit
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGLucidworks
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL dataKevin Lee
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 

Was ist angesagt? (19)

Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 

Andere mochten auch

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
Getting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBGetting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBBigstep
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchBeyondTrees
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 

Andere mochten auch (11)

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Getting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBGetting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DB
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearch
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 

Ähnlich wie Scalable Data Models with Elasticsearch

Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Avkash Chauhan
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesPhilipp Klöckner
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation SystemMinha Hwang
 
Query Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsQuery Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsMicrosoft Tech Community
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actElad Rosenheim
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartHakka Labs
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
 
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsSharath Rao
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...MongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference ArchitectureMongoDB
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 

Ähnlich wie Scalable Data Models with Elasticsearch (20)

Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
Query Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsQuery Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applications
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing act
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Search Analytics - Comperio
Search Analytics - ComperioSearch Analytics - Comperio
Search Analytics - Comperio
 
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference Architecture
 
Everything You Wish You Knew About Search
Everything You Wish You Knew About SearchEverything You Wish You Knew About Search
Everything You Wish You Knew About Search
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 

Kürzlich hochgeladen

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 

Kürzlich hochgeladen (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 

Scalable Data Models with Elasticsearch

  • 1. Elasticsearch Meetup | Amsterdam | April 7, 2016 Maarten Roosendaal & Anne Veling
  • 2. introduction • Anne Veling – Elasticsearch consultancy and custom training – Performance and Stability Troubleshooting – Software Architect, Team Lead
  • 3. • Hierarchical data model, multiple levels • High volume – searches – data changes • Complex query requirements – Both Product and Offer fields in query – Facet on both levels bol.com challenge
  • 4. Products and Offers faster indexing faster searching
  • 5.
  • 6. Test Data Creation • Node.js Script creating random data – Product • Title: two random nouns from noun list • Category: pick one out 26 nouns • Half have no offer, half between 1-4 – Offer • Random price between 1-20 • Seller: pick one out of 10k • Stream in memory, flush out to disk in 3 flavors – Each flavor keeping its own bulk size of 100k – For 1M, 10M and 100M products
  • 7. Document { "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1, "product": { "id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime" } }
  • 9. Nested { "_id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime", "offers": [ { "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1 } ] }
  • 10. Parent/Child { "_id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime” } { "_parent": "product95826" "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1 }
  • 11. • Zipped data files – 1M: 86Mb – 10M: 860Mb – 100M: 8.6Gg Getting it there
  • 13.
  • 14. Indexing • 1M product set, local naive – 80s Document – 41s Nested – 64s Parent/Child • ES index bottleneck: – Your source system and latency it can slurp it up faster than you can serve it
  • 15. Let’s take a break
  • 16. Use Cases Use Case A Use Case B Use Case C Product Search Word in Title Word in Title ∃ DeliveryC = 0 Word in Title ∃ Price < P Order By Relevance Relevance (Lowest) Price Display for top N products Product Fields Cheapest Offer fields Product Fields Correct Cheapest Offer fields Product Fields Cheapest Offer fields Aggregate On Category Category Category ∀ Offer SellerId ∀ Correct Offers SellerId ∀ Correct Offers SellerId ∀ Offer Price ∀ Correct Offers Price ∀ Correct Offers Price ∀ Offer DeliveryCode ∀ Correct Offers DeliveryCode ∀ Correct Offers DeliveryCode • Product • Offer
  • 17. Use Cases D: query B, roll up by family • Families (with products with offers) – with product.title:lunchroom – filter by product.offer.deliveryCode:tom orrow
  • 18. Searching for a lunchroom How hard can it be?
  • 19. Let’s search POST /boltest1m_doc/_search -> 3046 { "query": { "term": { "product.title": { "value": "lunchroom" } } } } POST /boltest1m_nested/_search -> 2026 { "query": { "term": { "title": { "value": "lunchroom" } } } } POST /boltest1m_parentchild/_search -> 2022 { "query": { "has_parent": { "parent_type": "product", "query": { "term": { "title": { "value": "lunchroom" } } } } } }
  • 20.
  • 21. ElasticSearch docs (and Lucene docs) Product with Doc Nested Parent/ Child no offer 1 1 (1) 1 1 offer 1 1 (2) 2 2 offers 2 1 (3) 3
  • 22. Real Queries • Add Details, Sorting • Product Facets – Category • Offer Facets – Seller ID – Price Buckets – Delivery Code Compare the numbers… Explain the differences...
  • 26.
  • 27. Query Tips • Use aggregations – Cardinality – top_hits ♥️ (with top_score) • Smart Grouping & Field Collapsing • Slooooow 😢 – inner_hits • Don’t forget post-filtering or result page lookup
  • 28. Ice Cream Bounty for making top_hits aggregation fast
  • 30. Results 0 20 40 60 80 100 120 140 160 180 200 a b c d 1m tun 30102015 32 GB new queries doc nested parentchild 0 500 1000 1500 2000 2500 3000 3500 a b c d 10m tun 30102015 32 GB new queries doc nested parentchild
  • 31. Conclusions • Parent/Child has limitations – Combining cross-level queries with aggregations in one go • Doc not as fast as we’d expected – Because we needed top_hits aggregation • Elasticsearch scales predictably
  • 32. Conclusions • For us, nested was the best solution • What is yours? • What are you searching for? – What are the rows? – What are the facets about?
  • 33. Lessons Learned • Testing the scalability of your data model – Fast iterations early on – Valuable insight in indexing and search requirements • Data Modeling is hard – Do it early – Make it fun
  • 34. Tech Lessons Learned • Don’t forget to tune the ES cluster – Configure memory ;) • If bulk file last line has no n, gets ignored! – count the differences • 100k bulk files with .000 suffixes ought to be enough for everyone, right? • Do not underestimate Sneakernet