SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Dailymotion
Elasticsearch
June 10, 2014
Meetup Elasticsearch
France #7
Cédric Hourcade
Core developer at Dailymotion
twitter.com/hced
> video search @ Dailymotion
1 > cluster overview and indexation
2 > query and score
3 > sharding
4 > benchmarks, tuning
5 > questions?
elasticsearch cluster
cluster and indexation
search cluster
cluster and indexation
> 10 nodes for video search
_ one main video index
_ 5 shards with 1 replica
_ nodes : 32 cores, 48 gb RAM, 15k disks
> 600 to 1000 search requests per second
> end-to-end response time < 40 ms
search cluster
cluster and indexation
elasticsearch cluster
10 nodes
mysql
farm
index
constantly
search cluster
cluster and indexation
elasticsearch cluster
10 nodes
mysql
farm
elasticsearch
indexer
2 nodes
index
constantly
filter data for
search
update if hash
changed
hash data
search
query and score
"query" : {
"function_score": {
"query": {
"custom_common": { … }
},
"script_score": {
"script": "custom_scorer",
"lang": "native"
},
"scoring_score": "multiply"
}
}
search
query and score
> custom query
x
> custom scorer
=
> score
search
query and score
Scorer
> custom scorer
_ only slightly alter the query score
_ take into account: recency, popularity, etc.
> boosted filters and scripts when testing
> native java for performance
search
query and score
Query
> we need to keep control of the query base score
> problem is our text content is thin
_ short title, a few tags
_ a more or less relevant description
> bare bones TF-IDF may not be suitable
_ TF not that relevant to us
search
query and score
> BM25: reduce importance of
document length
> why common terms query
_ increase performance
_ ignore popular terms when searching
_ but still use them for scoring
_ like a real time specialized stop words list
similarity:
my_bm25:
type: BM25
b: 0.001
> ignore inexistent terms in query
> boost repeated terms (TF) only if repeated in query
a doc titled “A A A game” has a better score than “A game”
only when explicitly searching for “A A A”
> boost term by position in query and documents
search
query and score
brown fox zerzer brown fox zerzer
the quick brown fox jumps.
^1.1 ^1.07 ^1.05 ^1.03 ^1.02
> keep both stemmed and original terms
> score them with dis_max (tie_breaker = 0)
> disable coord factor for consistent scoring
I like dogs
i like dogs dog token
1 2 3 position
search
query and score
"field": "dogs"
"dis_max": {
"tie_breaker" : 0,
"queries" : [
{ "term": { "field": "dogs" } },
{ "term": { "field": "dog" } } ...
sharding
what suits us
sharding
what suits us
> less shards make a query slower
> but not 16 times slower (112 ms vs 12 ms)
index / 16 shards index / 1 shard
1 ms request handling 1 ms request handling
10 ms
shard 0
(9ms)
shard 1
(10ms)
shard 2
(6ms)
shard 3
(9ms)
110 ms shard 0
shard 4
(10ms)
shard 5
(10ms)
shard 6
(9ms)
shard 7
(10ms)
shard 8
(10ms)
shard 9
(8ms)
shard 10
(5ms)
shard 11
(10ms)
shard 12
(7ms)
shard 13
(7ms)
shard 14
(10ms)
shard 15
(10ms)
1 ms return result 1 ms return result
12 ms 112 ms
sharding
what suits us
> takes more resources
> everything runs at 100 % for each query
> less requests per second for the same hardware
9 ms
+ 10 ms
+ 6 ms
+ (…)
=
140 ms
shard 0
(9ms)
shard 1
(10ms)
shard 2
(6ms)
shard 3
(9ms)
110 ms shard 0
shard 4
(10ms)
shard 5
(10ms)
shard 6
(9ms)
shard 7
(10ms)
shard 8
(10ms)
shard 9
(8ms)
shard 10
(5ms)
shard 11
(10ms)
shard 12
(7ms)
shard 13
(7ms)
shard 14
(10ms)
shard 15
(10ms)
140 ms spent by the shards 110 ms spent
sharding
what suits us
Before
> we used to have 40 shards on 18 nodes
_ ~2 millions docs per shard
_ 3 gb by shards
_ ~ 120 gb total index size
> cluster was very loaded
_ every single query was hitting all the nodes
_ response times could have been better
sharding
what suits us
After
> we now have 5 shards on 10 nodes
> cluster run smoother, less load
_ only 5 nodes involved per query
_ it handles many times more requests
sharding
what suits us
less data!
_ ~10 millions docs per shard
_ 4 gb by shards
_ ~ 25 gb total index size
> only data we need right now
_ { "_source" : false }
_ round numbers and dates
_ { "precision_step" : 2147483647 }
> less updates, faster indexation, rebalance, merges...
sharding
what suits us
drawbacks
> queries taken individually are slower…
> but only marginally slower
_ eg: 7 ms instead of 5 ms
> but some slower queries became more noticeable
how do we test
benchmarks, tuning
how do we test
benchmarks, tuning
load test
_ benchmark with Tsung
_ dedicated test cluster
_ run real queries, lots of them
_ aim for our expected load
_ monitor everything
_ reshard, change schema
_ set masters, data-only nodes...
repeat
how do we test
benchmarks, tuning
use warmers
> warm segments after each merge
_ prevent slow first queries
> set it up to build cache for the filters we use
> zero reasons for not using them
how do we test
benchmarks, tuning
"constant_score": {
"filter": {
"term": {
"visible": "yes"
}
}
}
"constant_score": {
"filter": {
"bool": {
"_cache": true,
"must": [
{ "term": {
"visible": "yes"
} },
{ "range": {
"age": { "from": 18, "to": 30 }
} }
] ...
how do we test
benchmarks, tuning
query testing
> to test a particular query raw performance
_ one index, one shard
_ millions of simple documents
_ merged in one segment
_ with some deletes
?
how do we test
benchmarks, tuning
{
"query": {
"filtered": {
"query": {
"match": { "title": "some very popular terms" }
},
"filter": {
"term": { "user": "cedric" }
}
}
}
}
!
how do we test
benchmarks, tuning
{
"query": {
"filtered": {
"strategy": "leap_frog",
"query": {
"match": { "title": "some very popular terms" }
},
"filter": {
"term": { "user": "cedric" }
}
}
}
}
how do we test
benchmarks, tuning
> we also use Elasticsearch to just filter and sort
> these queries match millions of documents
_ they are slow
_ even when terms are cached
_ iterating, scoring and sorting is tedious
how do we test
benchmarks, tuning
query
"sort": { "created": "desc" },
"query": {
"bool": {
"must": [
{ "term": { "public": true } }
]
}
}
how do we test
benchmarks, tuning
query result
"sort": { "created": "desc" },
"query": {
"bool": {
"must": [
{ "term": { "public": true } }
]
}
}
"took": 695
"hits": {
"total": 79582599
}
how do we test
benchmarks, tuning
> we know our data
> we can help our query
how do we test
benchmarks, tuning
query
"sort": { "created": "desc" },
"query": {
"bool": {
"must": [
{ "term": { "public": true } },
{ "range": {
"created": {
"from": "2014-06-03"
}
} }
]
...
> with a range filter
on the sorted field
how do we test
benchmarks, tuning
query result
"sort": { "created": "desc" },
"query": {
"bool": {
"must": [
{ "term": { "public": true } },
{ "range": {
"created": {
"from": "2014-06-03"
}
} }
]
...
"took": 15
"hits": {
"total": 92312
}
// Same top docs
// returned
how do we test
benchmarks, tuning
> what if there are not enough hits?
_ re-run the query without the filter
> we use a custom query to do just that!
_ breaks once it matches enough hits
_ runs at segment level
_ no round-trips
"sort": { "created": "desc" },
"query": {
"break_once": {
"minimum_hits": 100,
"query": { { "term": { "public": true } },
"filters": [
{ "range": { "created": { "from": "2014-06-03" } } },
{ "range": { "created": { "from": "2014-05-03” } } },
{ "range": { "created": { "from": "2014-01-03" } } }
]
}
}
how do we test
benchmarks, tuning
stop once
there are
enough
hits
thank you
> index only what you need now
> shard today, reshard tomorrow
> benchmark to find what suits you best
> test and optimize your queries
thank you
> questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례NAVER D2
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC
 
Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)jie cao
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Wongnai
 
Mythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDBMythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDBMongoDB
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python DevelopersTyler Hobbs
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database ReplicationMehdi Valikhani
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Webinar: Replication and Replica Sets
Webinar: Replication and Replica SetsWebinar: Replication and Replica Sets
Webinar: Replication and Replica SetsMongoDB
 
テスト用のプレゼンテーション
テスト用のプレゼンテーションテスト用のプレゼンテーション
テスト用のプレゼンテーションgooseboi
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with GroovySten Anderson
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 

Was ist angesagt? (20)

Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
 
MySQL under the siege
MySQL under the siegeMySQL under the siege
MySQL under the siege
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)
 
Mythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDBMythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDB
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python Developers
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Elastic search 검색
Elastic search 검색Elastic search 검색
Elastic search 검색
 
Webinar: Replication and Replica Sets
Webinar: Replication and Replica SetsWebinar: Replication and Replica Sets
Webinar: Replication and Replica Sets
 
テスト用のプレゼンテーション
テスト用のプレゼンテーションテスト用のプレゼンテーション
テスト用のプレゼンテーション
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with Groovy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 

Ähnlich wie Elasticsearch at Dailymotion

Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
performance vamos dormir mais?
performance vamos dormir mais?performance vamos dormir mais?
performance vamos dormir mais?tdc-globalcode
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupJustin Borgman
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
Neo4j after 1 year in production
Neo4j after 1 year in productionNeo4j after 1 year in production
Neo4j after 1 year in productionAndrew Nikishaev
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Brian Nauheimer
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksMongoDB
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchYevhen Shyshkin
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators iammutex
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 

Ähnlich wie Elasticsearch at Dailymotion (20)

Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
performance vamos dormir mais?
performance vamos dormir mais?performance vamos dormir mais?
performance vamos dormir mais?
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
Neo4j after 1 year in production
Neo4j after 1 year in productionNeo4j after 1 year in production
Neo4j after 1 year in production
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware Bottlenecks
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with Elasticsearch
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 

Kürzlich hochgeladen

Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 

Kürzlich hochgeladen (20)

Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 

Elasticsearch at Dailymotion

  • 1. Dailymotion Elasticsearch June 10, 2014 Meetup Elasticsearch France #7 Cédric Hourcade Core developer at Dailymotion twitter.com/hced
  • 2. > video search @ Dailymotion 1 > cluster overview and indexation 2 > query and score 3 > sharding 4 > benchmarks, tuning 5 > questions?
  • 4. search cluster cluster and indexation > 10 nodes for video search _ one main video index _ 5 shards with 1 replica _ nodes : 32 cores, 48 gb RAM, 15k disks > 600 to 1000 search requests per second > end-to-end response time < 40 ms
  • 5. search cluster cluster and indexation elasticsearch cluster 10 nodes mysql farm index constantly
  • 6. search cluster cluster and indexation elasticsearch cluster 10 nodes mysql farm elasticsearch indexer 2 nodes index constantly filter data for search update if hash changed hash data
  • 8. "query" : { "function_score": { "query": { "custom_common": { … } }, "script_score": { "script": "custom_scorer", "lang": "native" }, "scoring_score": "multiply" } } search query and score > custom query x > custom scorer = > score
  • 9. search query and score Scorer > custom scorer _ only slightly alter the query score _ take into account: recency, popularity, etc. > boosted filters and scripts when testing > native java for performance
  • 10. search query and score Query > we need to keep control of the query base score > problem is our text content is thin _ short title, a few tags _ a more or less relevant description > bare bones TF-IDF may not be suitable _ TF not that relevant to us
  • 11. search query and score > BM25: reduce importance of document length > why common terms query _ increase performance _ ignore popular terms when searching _ but still use them for scoring _ like a real time specialized stop words list similarity: my_bm25: type: BM25 b: 0.001
  • 12. > ignore inexistent terms in query > boost repeated terms (TF) only if repeated in query a doc titled “A A A game” has a better score than “A game” only when explicitly searching for “A A A” > boost term by position in query and documents search query and score brown fox zerzer brown fox zerzer the quick brown fox jumps. ^1.1 ^1.07 ^1.05 ^1.03 ^1.02
  • 13. > keep both stemmed and original terms > score them with dis_max (tie_breaker = 0) > disable coord factor for consistent scoring I like dogs i like dogs dog token 1 2 3 position search query and score "field": "dogs" "dis_max": { "tie_breaker" : 0, "queries" : [ { "term": { "field": "dogs" } }, { "term": { "field": "dog" } } ...
  • 15. sharding what suits us > less shards make a query slower > but not 16 times slower (112 ms vs 12 ms) index / 16 shards index / 1 shard 1 ms request handling 1 ms request handling 10 ms shard 0 (9ms) shard 1 (10ms) shard 2 (6ms) shard 3 (9ms) 110 ms shard 0 shard 4 (10ms) shard 5 (10ms) shard 6 (9ms) shard 7 (10ms) shard 8 (10ms) shard 9 (8ms) shard 10 (5ms) shard 11 (10ms) shard 12 (7ms) shard 13 (7ms) shard 14 (10ms) shard 15 (10ms) 1 ms return result 1 ms return result 12 ms 112 ms
  • 16. sharding what suits us > takes more resources > everything runs at 100 % for each query > less requests per second for the same hardware 9 ms + 10 ms + 6 ms + (…) = 140 ms shard 0 (9ms) shard 1 (10ms) shard 2 (6ms) shard 3 (9ms) 110 ms shard 0 shard 4 (10ms) shard 5 (10ms) shard 6 (9ms) shard 7 (10ms) shard 8 (10ms) shard 9 (8ms) shard 10 (5ms) shard 11 (10ms) shard 12 (7ms) shard 13 (7ms) shard 14 (10ms) shard 15 (10ms) 140 ms spent by the shards 110 ms spent
  • 17. sharding what suits us Before > we used to have 40 shards on 18 nodes _ ~2 millions docs per shard _ 3 gb by shards _ ~ 120 gb total index size > cluster was very loaded _ every single query was hitting all the nodes _ response times could have been better
  • 18. sharding what suits us After > we now have 5 shards on 10 nodes > cluster run smoother, less load _ only 5 nodes involved per query _ it handles many times more requests
  • 19. sharding what suits us less data! _ ~10 millions docs per shard _ 4 gb by shards _ ~ 25 gb total index size > only data we need right now _ { "_source" : false } _ round numbers and dates _ { "precision_step" : 2147483647 } > less updates, faster indexation, rebalance, merges...
  • 20. sharding what suits us drawbacks > queries taken individually are slower… > but only marginally slower _ eg: 7 ms instead of 5 ms > but some slower queries became more noticeable
  • 21. how do we test benchmarks, tuning
  • 22. how do we test benchmarks, tuning load test _ benchmark with Tsung _ dedicated test cluster _ run real queries, lots of them _ aim for our expected load _ monitor everything _ reshard, change schema _ set masters, data-only nodes... repeat
  • 23. how do we test benchmarks, tuning use warmers > warm segments after each merge _ prevent slow first queries > set it up to build cache for the filters we use > zero reasons for not using them
  • 24. how do we test benchmarks, tuning "constant_score": { "filter": { "term": { "visible": "yes" } } } "constant_score": { "filter": { "bool": { "_cache": true, "must": [ { "term": { "visible": "yes" } }, { "range": { "age": { "from": 18, "to": 30 } } } ] ...
  • 25. how do we test benchmarks, tuning query testing > to test a particular query raw performance _ one index, one shard _ millions of simple documents _ merged in one segment _ with some deletes
  • 26. ? how do we test benchmarks, tuning { "query": { "filtered": { "query": { "match": { "title": "some very popular terms" } }, "filter": { "term": { "user": "cedric" } } } } }
  • 27. ! how do we test benchmarks, tuning { "query": { "filtered": { "strategy": "leap_frog", "query": { "match": { "title": "some very popular terms" } }, "filter": { "term": { "user": "cedric" } } } } }
  • 28. how do we test benchmarks, tuning > we also use Elasticsearch to just filter and sort > these queries match millions of documents _ they are slow _ even when terms are cached _ iterating, scoring and sorting is tedious
  • 29. how do we test benchmarks, tuning query "sort": { "created": "desc" }, "query": { "bool": { "must": [ { "term": { "public": true } } ] } }
  • 30. how do we test benchmarks, tuning query result "sort": { "created": "desc" }, "query": { "bool": { "must": [ { "term": { "public": true } } ] } } "took": 695 "hits": { "total": 79582599 }
  • 31. how do we test benchmarks, tuning > we know our data > we can help our query
  • 32. how do we test benchmarks, tuning query "sort": { "created": "desc" }, "query": { "bool": { "must": [ { "term": { "public": true } }, { "range": { "created": { "from": "2014-06-03" } } } ] ... > with a range filter on the sorted field
  • 33. how do we test benchmarks, tuning query result "sort": { "created": "desc" }, "query": { "bool": { "must": [ { "term": { "public": true } }, { "range": { "created": { "from": "2014-06-03" } } } ] ... "took": 15 "hits": { "total": 92312 } // Same top docs // returned
  • 34. how do we test benchmarks, tuning > what if there are not enough hits? _ re-run the query without the filter > we use a custom query to do just that! _ breaks once it matches enough hits _ runs at segment level _ no round-trips
  • 35. "sort": { "created": "desc" }, "query": { "break_once": { "minimum_hits": 100, "query": { { "term": { "public": true } }, "filters": [ { "range": { "created": { "from": "2014-06-03" } } }, { "range": { "created": { "from": "2014-05-03” } } }, { "range": { "created": { "from": "2014-01-03" } } } ] } } how do we test benchmarks, tuning stop once there are enough hits
  • 36.
  • 37. thank you > index only what you need now > shard today, reshard tomorrow > benchmark to find what suits you best > test and optimize your queries