SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Efficient Diversification of Web
        Search Results
    G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri
                    ISTI - CNR, Pisa, Italy
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Query Diversification as a
            Coverage Problem
• Hypothesis:
 • For each user’s query I can tell what’s the set of all possible intents
 • For each document in the collection I can tell what are all the possible user’s
    intents it represents
    • each intent for each document is, possibly, weighted by a value representing how
      much that intent is represented by that document (e.g., 1/2 of document D is
      related to the intent of “digital photography techniques”)
• Goal:
 • Select the set of k documents in the collection covering the maximum amount of
    intent weight. I.e., maximize the number of satisfied users.


              F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   3
State-of-the-Art Methods


•   IASelect:
 •   Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In
     Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza-
     Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.


• xQuAD:
 •   Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search
     result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh,
     NC, USA, 2010. ACM.




                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow     4
Diversify (k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                                                         the weight
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c
                                                   no doc is
                                                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c

                at least one doc is                no doc is
                  pertinent to c                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




                                                                       Same problem as before...
                                                                       It may not diversify, at all.
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
Our Proposal:
                   MaxUtility




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Vinci                     Our Proposal:
                           MaxUtility




        F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town                      Our Proposal:
           Vinci Group                      MaxUtility




                         F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
MaxUtility_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q


                                            Set of possible query
                                               specializations




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
Why it is Efficient?

• By using a simple arithmetic argument we can show that:


• Therefore we can find the optimal set S of diversified
 documents by using a sort-based approach.


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   11
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
The Specialization Set Sq
• It is crucial for OptSelect to
  have the set of specialization
  available for each query.
• Our method is, thus, query log-
  based.
 • we use a query recommender system
   to obtain a set of queries from which Sq
   is built by including the most popular
   (i.e., freq. in query log > f(q) / s)
   recommendations:


                    F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   13
Probability Estimation




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   14
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Experiments: Settings

• TREC 2009 Web track's Diversity Task framework:
 • ClueWeb-B, the subset of the TREC ClueWeb09 dataset
 • The 50 topics (i.e., queries) provided by TREC
 • We evaluate α-NDCG and IA-P
• All the tests were conducted on a Intel Core 2 Quad PC with
 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22).


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   16
Experiments: Quality




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   17
Experiments: Efficiency




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   18
Conclusions and Future Work
• We studied the problem of search results diversification from an efficiency point of
  view
• We derived a diversification method (OptSelect):
  •   same (or better) quality of the state of the art

  •   up to 100 times faster

• Future work:
  •   the exploitation of users' search history for personalizing result diversification

  •   the use of click-through data to improve our effectiveness results, and

  •   the study of a search architecture performing the diversification task in parallel with the
      document scoring phase (Done! See DDR2011 paper)


                 F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   19
Question Time




                                     Fabrizio Silvestri
                                   ISTI-CNR, Pisa Italy
                          http://hpc.isti.cnr.it/~fabriziosilvestri
                                   f.silvestri@isti.cnr.it
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   20

Weitere ähnliche Inhalte

Mehr von yaevents

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...yaevents
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндексyaevents
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareyaevents
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluationyaevents
 

Mehr von yaevents (20)

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндекс
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-aware
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

"Efficient Diversification of Web Search Results"

  • 1. Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI - CNR, Pisa, Italy
  • 2. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 3. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 4. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 5. Query Diversification as a Coverage Problem • Hypothesis: • For each user’s query I can tell what’s the set of all possible intents • For each document in the collection I can tell what are all the possible user’s intents it represents • each intent for each document is, possibly, weighted by a value representing how much that intent is represented by that document (e.g., 1/2 of document D is related to the intent of “digital photography techniques”) • Goal: • Select the set of k documents in the collection covering the maximum amount of intent weight. I.e., maximize the number of satisfied users. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 3
  • 6. State-of-the-Art Methods • IASelect: • Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza- Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14. • xQuAD: • Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh, NC, USA, 2010. ACM. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 4
  • 7. Diversify (k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 8. Diversify (k) intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 9. Diversify (k) the weight intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 10. Diversify (k) the weight intents the weight is the probability of being relative to intent c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 11. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 12. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 13. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c at least one doc is no doc is pertinent to c pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 14. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 15. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 16. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 17. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 18. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 19. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 20. xQuAD_Diversify(k) Same problem as before... It may not diversify, at all. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 21. Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 22. Vinci Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 23. Leonardo da Vinci Vinci Vinci Town Our Proposal: Vinci Group MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 24. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 25. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 26. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 27. MaxUtility_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 28. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 29. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q Set of possible query specializations F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 30. Why it is Efficient? • By using a simple arithmetic argument we can show that: • Therefore we can find the optimal set S of diversified documents by using a sort-based approach. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 11
  • 31. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 32. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 33. The Specialization Set Sq • It is crucial for OptSelect to have the set of specialization available for each query. • Our method is, thus, query log- based. • we use a query recommender system to obtain a set of queries from which Sq is built by including the most popular (i.e., freq. in query log > f(q) / s) recommendations: F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 13
  • 34. Probability Estimation F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 14
  • 35. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 36. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 37. Experiments: Settings • TREC 2009 Web track's Diversity Task framework: • ClueWeb-B, the subset of the TREC ClueWeb09 dataset • The 50 topics (i.e., queries) provided by TREC • We evaluate α-NDCG and IA-P • All the tests were conducted on a Intel Core 2 Quad PC with 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22). F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 16
  • 38. Experiments: Quality F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 17
  • 39. Experiments: Efficiency F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 18
  • 40. Conclusions and Future Work • We studied the problem of search results diversification from an efficiency point of view • We derived a diversification method (OptSelect): • same (or better) quality of the state of the art • up to 100 times faster • Future work: • the exploitation of users' search history for personalizing result diversification • the use of click-through data to improve our effectiveness results, and • the study of a search architecture performing the diversification task in parallel with the document scoring phase (Done! See DDR2011 paper) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 19
  • 41. Question Time Fabrizio Silvestri ISTI-CNR, Pisa Italy http://hpc.isti.cnr.it/~fabriziosilvestri f.silvestri@isti.cnr.it F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 20