This poster presents our preliminary research on how information can be extracted from user browsing behavior to identify understudied works that are relevant but have too few viewers. We investigate how to apply two types of analysis—a formula called Effective Collection Size and ‘multi-armed bandit’ analysis—to extracted user data to develop alternative methods of retrieving materials from collection that are collated by richer factors of relevancy. We anticipate that these analyses will enable the development of an information retrieval system that presents a broad range of content in a user’s search results.
Bandits and Browsing: Effective Collection Size as Way of Quantifying Search Efficiency
1. Bandits and Browsing
Effective Collection Size as Way of Quantifying Search Efficiency
Harriett E. Green, Kirk Hess, and Richard D. Hislop University of Illinois at Urbana-Champaign
green19@illinois.edu kirkhess@illinois.edu rhislop2@illinois.edu
EFFECTIVE COLLECTION ANALYSIS AND INITIAL RESULTS
SIZE • Ran statistical analysis on the English collection.
• Found books and topics that are of unusually high
Effective Collection Size quantifies how
use and quantified statistically.
efficiently a library uses its collection. It
• Identified improbably understudied items.
focuses on highlighting understudied
• Found topics of interest for digital collection
works and aims to prevent the omission
development.
of useful materials in a collection.
WHY? NEXT STEPS
• Analyze the broader University of Illinois catalog.
Biases in traditional search
• Incorporate analysis into Illinois Harvest digital
algorithms send most users to the
library search results.
same high-ranking materials. Digital Circulation of all titles with threshold • Produce a set of tools to help highlight
libraries can adapt to user behavior, of 100 checkouts understudied materials during reference and
identify useful material and send
digitization projects.
users to relevant but understudied
• Use results to quantify increases in efficiency of
sources.
collection use.
OUR PROJECT SELECT REFERENCES
Zhou, T., Kuscsik, Z., Liu, J.G., Medo, M., Wakeling, J.R., & Zhang, Y.C.
• Prototype data: University of (2010). Solving the apparent diversity-accuracy dilemma of
Illinois Library catalog circulation recommender systems. Proceedings of the National Academy of
Sciences of the United States of America, 107, 4511-4515.
statistics
• Use our physical catalog to learn Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-
about collection use bandit approach to personalized news article recommendation.
Proceedings of the Nineteenth International Conference on World Wide
• Apply this to improve search and Web, 661-670. Doi: 10.1145/1772690.1772758
recommendations in digital
collections
Circulation of all titles with more than Xie, I. & Cool, C. (2009). Understanding help seeking within the
100 checkouts context of searching digital libraries. Journal of the American Society
for Information Science and Technology, 60, 477--494.
green19@illinois.edu Twitter: @greenharr 2011 DLF Forum October 31-November 1, 2011