Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Daniel Schneiter
Elastic{Meetup} #41, Zürich, April 9, 2019
Original author: Christoph Büscher
Made to Measure:

Ranking E...
!2
If you can not
measure it,

you cannot
improve it!
AlmostAnActualQuoteTM by Lord Kelvin
https://commons.wikimedia.org/w...
?!3
How
good
is
your
search
Image by Kecko
https://www.flickr.com/photos/kecko/18146364972 (CC BY 2.0)
!4
Image by Muff Wiggler
https://www.flickr.com/photos/muffwiggler/5605240619 (CC BY 2.0)
!5
Ranking Evaluation



A repeatable way
to quickly measure the quality
of search results

over a wide range of user needs
!6
• Automate - don’t make people
look at screens
• no gut-feeling / “management-
driven” ad-hoc search ranking
REPEATABIL...
!7
• fast iterations instead of long
waits (e.g. in A/B testing)
SPEED
!8
• numeric output
• support of different metrics
• define “quality“ in your domain
QUALITY

MEASURE
!9
• optimize across wider range of
use case (aka “information
needs”)
• think about what the majority
of your users want
...
!10
Prerequisites for Ranking Evaluation
1. Define a set of typical information needs
2. For each search case, rate your d...
!11
Search Evaluation Continuum
speed
preparation time
people looking 

at screens
Some sort of

unit test
QA assisted by
...
!12
Where Ranking Evaluation can help
Development Production Communication

Tool
• guiding design decisions
• enabling qui...
!13
Elasticsearch 

‘rank_eval’ API
!14
Ranking Evaluation API
GET /my_index/_rank_eval
{
"metric": {
"mean_reciprocal_rank": {
[...]
}
},
"templates": [{
[.....
!15
Ranking Evaluation API Details
"metric": {
"precision": {
"relevant_rating_threshold": "2",
"k": 5
}
}
metric
"request...
{
"rank_eval": {
"metric_score": 0.431,
"details": {
"my_query_id1": {
"metric_score": 0.6,
"unrated_docs": [
{
"_index": ...
!17
How to get document ratings?
1. Define a set of typical information needs of user

(e.g. analyze logs, ask product man...
!18
Metrics currently available
Metric Description Ratings
Precision At K Set-based metric; ratio of relevant doc in top K...
!19
Precision At K
• In short: “How many good results appear in the first K results”

(e.g. first few pages in UI)
• suppo...
!20
Reciprocal Rank
• supports only boolean relevance judgements
• PROS: easy to understand & communicate
• CONS: limited ...
!21
Discounted Cumulative Gain (DCG)
• Predecessor: Cumulative Gain (CG)
• sums relevance judgement over top k results
Sou...
!22
Expected Reciprocal Rank (ERR)
• cascade based metric
• supports graded relevance judgements
• model assumes user goes...
!23
DEMO TIME
!24
Demo project and Data
• Demo uses aprox. 1800 documents from the english Wikipedia
• Wikipedias Discovery department c...
!25
Q&A
!26
Some questions I have for you…
• How do you measure search relevance currently?
• Did you find anything useful about t...
!27
Further reading
• Manning, Raghavan & Schütze: Introduction to Information
Retrieval, Cambridge University Press. 2008...
Nächste SlideShare
Wird geladen in …5
×

von

Made to Measure: Ranking Evaluation using Elasticsearch Slide 1 Made to Measure: Ranking Evaluation using Elasticsearch Slide 2 Made to Measure: Ranking Evaluation using Elasticsearch Slide 3 Made to Measure: Ranking Evaluation using Elasticsearch Slide 4 Made to Measure: Ranking Evaluation using Elasticsearch Slide 5 Made to Measure: Ranking Evaluation using Elasticsearch Slide 6 Made to Measure: Ranking Evaluation using Elasticsearch Slide 7 Made to Measure: Ranking Evaluation using Elasticsearch Slide 8 Made to Measure: Ranking Evaluation using Elasticsearch Slide 9 Made to Measure: Ranking Evaluation using Elasticsearch Slide 10 Made to Measure: Ranking Evaluation using Elasticsearch Slide 11 Made to Measure: Ranking Evaluation using Elasticsearch Slide 12 Made to Measure: Ranking Evaluation using Elasticsearch Slide 13 Made to Measure: Ranking Evaluation using Elasticsearch Slide 14 Made to Measure: Ranking Evaluation using Elasticsearch Slide 15 Made to Measure: Ranking Evaluation using Elasticsearch Slide 16 Made to Measure: Ranking Evaluation using Elasticsearch Slide 17 Made to Measure: Ranking Evaluation using Elasticsearch Slide 18 Made to Measure: Ranking Evaluation using Elasticsearch Slide 19 Made to Measure: Ranking Evaluation using Elasticsearch Slide 20 Made to Measure: Ranking Evaluation using Elasticsearch Slide 21 Made to Measure: Ranking Evaluation using Elasticsearch Slide 22 Made to Measure: Ranking Evaluation using Elasticsearch Slide 23 Made to Measure: Ranking Evaluation using Elasticsearch Slide 24 Made to Measure: Ranking Evaluation using Elasticsearch Slide 25 Made to Measure: Ranking Evaluation using Elasticsearch Slide 26 Made to Measure: Ranking Evaluation using Elasticsearch Slide 27
Nächste SlideShare
What to Upload to SlideShare
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

1 Gefällt mir

Teilen

Herunterladen, um offline zu lesen

Made to Measure: Ranking Evaluation using Elasticsearch

Herunterladen, um offline zu lesen

Elastic{Meetup} #41, Zurich, April 9, 2019
presented by Daniel Schneiter

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Made to Measure: Ranking Evaluation using Elasticsearch

  1. 1. Daniel Schneiter Elastic{Meetup} #41, Zürich, April 9, 2019 Original author: Christoph Büscher Made to Measure:
 Ranking Evaluation using Elasticsearch
  2. 2. !2 If you can not measure it,
 you cannot improve it! AlmostAnActualQuoteTM by Lord Kelvin https://commons.wikimedia.org/wiki/File:Portrait_of_William_Thomson,_Baron_Kelvin.jpg
  3. 3. ?!3 How good is your search Image by Kecko https://www.flickr.com/photos/kecko/18146364972 (CC BY 2.0)
  4. 4. !4 Image by Muff Wiggler https://www.flickr.com/photos/muffwiggler/5605240619 (CC BY 2.0)
  5. 5. !5 Ranking Evaluation
 
 A repeatable way to quickly measure the quality of search results
 over a wide range of user needs
  6. 6. !6 • Automate - don’t make people look at screens • no gut-feeling / “management- driven” ad-hoc search ranking REPEATABILITY
  7. 7. !7 • fast iterations instead of long waits (e.g. in A/B testing) SPEED
  8. 8. !8 • numeric output • support of different metrics • define “quality“ in your domain QUALITY
 MEASURE
  9. 9. !9 • optimize across wider range of use case (aka “information needs”) • think about what the majority of your users want • collect data to discover what is important for your use case USER
 NEEDS
  10. 10. !10 Prerequisites for Ranking Evaluation 1. Define a set of typical information needs 2. For each search case, rate your documents for those information needs
 (either binary relevant/non-relevant or on some graded scale) 3. If full labelling is not feasible, choose a small subset instead
 (often the case because document set is too large) 4. Choose a metric to calculate.
 Some good metrics already defined in Information Retrieval research: • Precision@K, (N)DCG, ERR, Reciprocal Rank etc… Source: Gray Arial 10pt
  11. 11. !11 Search Evaluation Continuum speed preparation time people looking 
 at screens Some sort of
 unit test QA assisted by scripts user studies A/B testing Ranking Evaluation slow fast little lots
  12. 12. !12 Where Ranking Evaluation can help Development Production Communication
 Tool • guiding design decisions • enabling quick iteration • helps defining “search quality” clearer • forces stakeholders to “get real” about their expectations • monitor changes • spot degradations
  13. 13. !13 Elasticsearch 
 ‘rank_eval’ API
  14. 14. !14 Ranking Evaluation API GET /my_index/_rank_eval { "metric": { "mean_reciprocal_rank": { [...] } }, "templates": [{ [...] }], "requests": [{
 "template_id": “my_query_template”, "ratings": [...], "params": { "query_string": “hotel amsterdam", "field": "text" }
 [...] }] } • introduced in 6.2 (still experimental API) • joint work between • Christoph Büscher (@dalatangi) • Isabel Drost-Fromm (@MaineC) • Inputs: • a set of search requests (“information needs”) • document ratings for each request • a metrics definition; currently available • Precision@K • Discounted Cumulative Gain / (N)DCG • Expected Reciprocal Rank / ERR • MRR, …

  15. 15. !15 Ranking Evaluation API Details "metric": { "precision": { "relevant_rating_threshold": "2", "k": 5 } } metric "requests": [{ "id": "JFK_query", "request": { “query”: { […] } }, "ratings": […] }, … other use cases …] requests "ratings": [ { "_id": "3054546", "rating": 3 }, { "_id": "5119376", "rating": 1 }, […] ] ratings
  16. 16. { "rank_eval": { "metric_score": 0.431, "details": { "my_query_id1": { "metric_score": 0.6, "unrated_docs": [ { "_index": "idx", "_id": "1960795" }, [...] ], "hits": [...], "metric_details": { “precision" : { “relevant_docs_retrieved": 6,
 "docs_retrieved": 10 } } }, "my_query_id2" : { [...] } } } } !16 _rank_eval response overall score details per query maybe rate those? details about metric
  17. 17. !17 How to get document ratings? 1. Define a set of typical information needs of user
 (e.g. analyze logs, ask product management / customer etc…) 2. For each case, get small set of candidate documents
 (e.g. by very broad query) 3. Rate those documents with respect to the underlying information need • can initially be done by you or other stakeholders;
 later maybe outsource e.g. via Mechanical Turk 4. Iterate! Source: Gray Arial 10pt
  18. 18. !18 Metrics currently available Metric Description Ratings Precision At K Set-based metric; ratio of relevant doc in top K results binary Reciprocal Rank (RR) Positional metric; inverse of the first relevant document binary Discounted Cumulative Gain (DCG) takes order into account; highly relevant docs score more
 if they appear earlier in result list graded Expected Reciprocal Rank (ERR) motivated by “cascade model” of search; models dependency of results with respect to their predecessors graded
  19. 19. !19 Precision At K • In short: “How many good results appear in the first K results”
 (e.g. first few pages in UI) • supports only boolean relevance judgements • PROS: easy to understand & communicate • CONS: least stable across different user needs, e.g. total number of relevant documents for a query influences precision at k Source: Gray Arial 10pt prec@k = # relevant docs{ } # all results at k{ }
  20. 20. !20 Reciprocal Rank • supports only boolean relevance judgements • PROS: easy to understand & communicate • CONS: limited to cases where amount of good results doesn’t matter • If averaged over a sample of queries Q often called MRR
 (mean reciprocal rank): Source: Gray Arial 10pt RR = 1 position of first relevant document MRR = 1 Q 1 rankii Q ∑
  21. 21. !21 Discounted Cumulative Gain (DCG) • Predecessor: Cumulative Gain (CG) • sums relevance judgement over top k results Source: Gray Arial 10pt CG = relk i=1 k ∑ DCG = reli log2 (i +1)i=1 k ∑ • DCG takes position into account • divides by log2 at each position • NDCG (Normalized DCG) • divides by “ideal” DCG for a query (IDCG) NDCG = DCG IDCG
  22. 22. !22 Expected Reciprocal Rank (ERR) • cascade based metric • supports graded relevance judgements • model assumes user goes through
 result list in order and is satisfied with
 the first relevant document • R_i probability that user stops at position i • ERR is high
 when relevant document appear early Source: Gray Arial 10pt ERR = 1 r (1− Ri )Rr i=1 r−1 ∏r=1 k ∑ Ri = 2 reli −1 2 relmax reli ! relevance at pos. i relmax ! maximal relevance grade
  23. 23. !23 DEMO TIME
  24. 24. !24 Demo project and Data • Demo uses aprox. 1800 documents from the english Wikipedia • Wikipedias Discovery department collects and publishes relevance judgements with their Discernatron project • Bulk data and all query examples available at
 https://github.com/cbuescher/rankEvalDemo Source: Gray Arial 10pt
  25. 25. !25 Q&A
  26. 26. !26 Some questions I have for you… • How do you measure search relevance currently? • Did you find anything useful about the ranking evaluation approach? • Feedback about usability of the API
 (ping be on Github or our Discuss Forum @cbuescher) Source: Gray Arial 10pt
  27. 27. !27 Further reading • Manning, Raghavan & Schütze: Introduction to Information Retrieval, Cambridge University Press. 2008. • Metlzer, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. Proceeding of the 18th ACM Conference on Information and Knowledge Management - CIKM ’09, 621. • Blog: https://www.elastic.co/blog/made-to-measure-how-to- use-the-ranking-evaluation-api-in-elasticsearch • Docs: https://www.elastic.co/guide/en/elasticsearch/reference/ current/search-rank-eval.html • Discuss: https://discuss.elastic.co/c/elasticsearch (cbuescher) • Github: :Search/Ranking Label (cbuescher) Source: Gray Arial 10pt
  • newpopsyoshioka

    Sep. 15, 2019

Elastic{Meetup} #41, Zurich, April 9, 2019 presented by Daniel Schneiter

Aufrufe

Aufrufe insgesamt

357

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

1

Befehle

Downloads

11

Geteilt

0

Kommentare

0

Likes

1

×