The task definition of the Social Book Search Lab describes complex goal-oriented and non-goal tasks. To satisfy the resulting information needs, the user can utilise and combine different sources of evidence, like, for instance, metadata (e.g. abstract, title, author) and reviews and ratings provided by the user. The challenge is to support the user in this endeavour to create an effective search experience. To this end, in this talk I will discuss how this challenge relates to the well-known principle of polyrepresentation. I will then introduce a probabilistic logic-based framework called POLAR, which is capable of handling complex queries based on the graph induced by user-generated content. Subsequently I will provide a brief outlook on further formal models that try to support the user beyond the typical query-and-result paradigm. The first one is based on quantum probabilities, neatly combining geometry and probability theory to support different forms of user interaction and polyrepresentation. The latter one combines polyrepresentation with probabilistic clustering and the idea of a simulated user.
Aspirational Block Program Block Syaldey District - Almora
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the others said?
1. Polyrepresentation in Complex (Book) Search
Tasks
How can we use what the others said?
Ingo Frommholz
University of Bedfordshire
ingo.frommholz@beds.ac.uk
Twitter: @iFromm
CLEF 2015 Social Book Search Workshop
September 10, 2015
. . . . . . . . . . . . . . . . . . . .
6. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
IN Facets and Polyrepresentation
“Good introduction to quantum mechanics”
▶ Relevance decision goes beyond topicality
▶ Collections like Amazon/LT/BritishLibrary
▶ Rich pool of potentially useful information (metadata,
user-generated content)
▶ Different views on documents, relevant for different aspects of the
information need (IN)
▶ Combine the evidence (e.g. metadata and user-generated
content) to get a more accurate estimation of
relevance/usefulness
▶ [Koolen, 2014] puts user-generated content into the index – it
worked!
▶ Reviews and tags complimentary to each other and to
professional metadata
▶ Polyrepresentation a key principle (exploits different
contexts [Ingwersen and Järvelin, 2005])
11. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Abstraction for Information Retrieval
▶ Provide a task-oriented solution for knowledge engineers
▶ Should not have to bother with the underlying retrieval model/data
sources/data storage and organization
▶ Instead focus on the task at hand
▶ Support complex retrieval strategies and information needs
▶ Allows for exploiting task-crossovers and synergies as well as
reusing concepts defined for similar tasks
37. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of the Framework
▶ Each user interaction triggers an observation and thus a change
of state
▶ Our evaluation shows that the framework can compete with
standard models in ad hoc IR tasks
▶ Different IR tasks can be formulated in this framework
(filtering [Piwowarski et al., 2010b], query
sessions [Frommholz et al., 2011],
summarisation [Piwowarski et al., 2012])
41. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Need-based Vector
▶ Let REPin be the set of representations1
of an information need in
▶ Motivated by the Optimum Clustering Framework (OCF), which is
based on the probability of relevance [Fuhr et al., 2011]
▶ Pr(R|d,ri ) is computed for each document d and ri ∈ REPin
⃗τin(d) =
Pr(R|d,r1)
...
Pr(R|d,rn)
(1)
1
search terms, work task, ideal answer, current info need, background knowledge
43. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Some Findings (using iSearch)
▶ Some statistically significant improvements over a BM25 baseline
(NDCG@30) using the ranking created by a simple simulated
user strategy when concatenating the IN and Document
representations [Abbasi and Frommholz, 2015b]
▶ Statistical significant improvements (NDCG) when using
document and IN representations separately and assuming an
ideal (oracle-based) cluster ranking
[Abbasi and Frommholz, 2015a]
▶ This shows us our idea is basically promising!
▶ Finding the total cognitve overlap (TOC) using cluster ranking is
challenging [Frommholz and Abbasi, 2014]
▶ Different interpretations of the TOC: The one with the highest
precision? The one with the highest pairwise precision? The one
where all representations get a high value?
▶ The latter one could be identified more easily (MRR = 0.575
compared to around 0.3 for the others)
45. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusion
▶ The rich source of evidence in SBS should be combined to tackle
complex information needs
▶ Probabilistic models for expressing complex information needs
and interactive search
▶ POLAR (abstraction for annotation-based search)
▶ Quantum Information Access
▶ Probabilistic polyrepresentative clustering (simulated user)
▶ It seems polyrepresentation can successfully be applied
▶ Good idea to integrate different sources
▶ Need to do it wisely
47. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography I
Abbasi, M. K. and Frommholz, I. (2015a).
Cluster-based Polyrepresentation as Science Modelling Approach
for Information Retrieval.
Scientometrics, 102(3):2301–2322.
Abbasi, M. K. and Frommholz, I. (2015b).
Polyrepresentative Clustering: A Study of Simulated User
Strategies and Representations.
In Mayr, P., Frommholz, I., and Mutschke, P., editors, Proc. of the
2nd Workshop on Bibliometric-enhanced Information Retrieval
(BIR2015), pages 47–54, Vienna, Austria. CEUR-WS.org.
Agosti, M., Ferro, N., Frommholz, I., and Thiel, U. (2004).
Annotations in Digital Libraries and Collaboratories – Facets,
Models and Usage.
In Heery, R. and Lyon, L., editors, Research and Advanced
Technology for Digital Libraries. Proc. European Conference on
48. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography II
Digital Libraries (ECDL 2004), Lecture Notes in Computer
Science, pages 244–255, Heidelberg et al. Springer.
Frommholz, I. and Abbasi, M. K. (2014).
On Clustering and Polyrepresentation.
In de Rijke, M., Kenter, T., de Vries, A. P., Zhai, C., de Jong, F.,
Radinsky, K., and Hofmann, K., editors, Proceedings of the
European Conference on Information Retrieval (ECIR 2014),
volume 1, pages 618–623. Springer.
Frommholz, I. and Fuhr, N. (2006a).
Evaluation of Relevance and Knowledge Augmentation in
Discussion Search.
In Gonzalo, J., Thanos, C., Verdejo, M. F., and Carrasco, R. C.,
editors, Research and Advanced Technology for Digital Libraries.
Proc. of the 10th European Conference on Digital Libraries (ECDL
2006), Lecture Notes in Computer Science, pages 279–290,
Heidelberg et al. Springer.
49. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography III
Frommholz, I. and Fuhr, N. (2006b).
Probabilistic, Object-oriented Logics for Annotation-based
Retrieval in Digital Libraries.
In Nelson, M., Marshall, C., and Marchionini, G., editors, Proc. of
the 6th ACM/IEEE Joint Conference on Digital Libraries (JCDL
2006), pages 55–64, New York. ACM.
Frommholz, I., Larsen, B., Piwowarski, B., Lalmas, M., Ingwersen,
P., and van Rijsbergen, K. (2010).
Supporting Polyrepresentation in a Quantum-inspired Geometrical
Retrieval Framework.
In Proceedings of the 2010 Information Interaction in Context
Symposium, pages 115–124, New Brunswick. ACM.
50. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography IV
Frommholz, I., Piwowarski, B., Lalmas, M., and van Rijsbergen, K.
(2011).
Processing Queries in Session in a Quantum-Inspired IR
Framework.
In Clough, P., Foley, C., Gurrin, C., Jones, G. J. F., Kraaij, W., Lee,
H., and Mudoch, V., editors, Proceedings ECIR 2011, volume
6611 of Lecture Notes in Computer Science, pages 751–754.
Springer.
Fuhr, N., Lechtenfeld, M., Stein, B., and Gollub, T. (2011).
The Optimum Clustering Framework : Implementing the Cluster
Hypothesis.
Information Retrieval, 14.
Ingwersen, P. and Järvelin, K. (2005).
The turn: integration of information seeking and retrieval in
context.
Springer-Verlag New York, Inc., Secaucus, NJ, USA.
51. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography V
Koolen, M. (2014).
"User Reviews in the Search Index? That’ll Never Work!".
In Proceedings ECIR 2014, pages 323–334.
Lykke, M., Larsen, B., Lund, H., and Ingwersen, P. (2010).
Developing a Test Collection for the Evaluation of Integrated
Search.
In Proceedings ECIR 2010, pages 627–630.
Piwowarski, B., Amini, M.-R., and Lalmas, M. (2012).
On using a Quantum Physics formalism for Multi-document
Summarisation.
Journal of the American Society for Information Science and
Technology (JASIST).
52. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography VI
Piwowarski, B., Frommholz, I., Lalmas, M., and Van Rijsbergen, K.
(2010a).
What can Quantum Theory Bring to Information Retrieval?
In Proc. 19th International Conference on Information and
Knowledge Management, pages 59–68.
Piwowarski, B., Frommholz, I., Moshfeghi, Y., Lalmas, M., and van
Rijsbergen, K. (2010b).
Filtering documents with subspaces.
In Proceedings of the 32nd European Conference on Information
Retrieval (ECIR 2010), pages 615–618.
van Rijsbergen, C. J. (2004).
The Geometry of Information Retrieval.
Cambridge University Press, New York, NY, USA.