SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED
METADATA FOR BOOK SEARCH
TOINE BOGERS
VIVIEN PETRAS
MARCH 23, 2017iCONFERENCE 2017
OUTLINE
▸ Introduction
▸ Methodology & Experimental Setup
▸ Analysis
– Tags vs. Controlled Vocabularies
– Book Search Requests
– Failure Analysis
▸ Conclusions & Future Work
2
INTRODUCTION
MOTIVATION
▸ Readers often struggle with existing systems (i.e., library
catalogs, Amazon, eBook sellers) to discover new books
– Information needs are contextual, personal & complex
– Book metadata does not contain the necessary information
4
EARLIER WORK
▸ iConference 2015
– Tags outperform controlled vocabularies for search, but
sometimes controlled vocabularies are better.
– Controlled vocabularies contains more unique terms, tags
more repetition of terms.
▸ Why?
– Terminology
– Popularity / frequency
– Type of request
5
STUDY OBJECTIVES
▸ Why are tags better than controlled vocabularies for book
search?
– Which types of book search requests are better addressed
using tags and which using CV?
– Which book search requests fail completely and what
characterizes such requests?
6
METHODOLOGY&
EXPERIMENTAL SETUP
EXPERIMENTAL SETUP
▸ Controlled Vocabulary content (CV)
– DDC class labels
– Subjects
– Geographic names
– Category labels
– LCSH terms
▸ Tags
– Each tag occurs as many times as it has been assigned by
the users
▸ Unique tags
– Each tag occurs only once
8
AMAZON/LIBRARYTHING COLLECTION 9
Tags
Tags
Controlled Vocabulary Content (CV)
DDC class labels
subjects
geographic names
category labels
LCSH terms
Unique Tags
Unique Tags per record
ANNOTATED LT TOPIC
10
Recommended
books
Topic title
Narrative
EXPERIMENTAL SETUP
▸ Amazon / LibraryThing collection of book records
– 2 million records
▸ LibraryThing forum topics for search requests
– 334 search requests for testing
▸ Relevance judgements
– Recommendations from LT members with graded relevance scoring
(highest relevance if book is added by searcher)
▸ Evaluation metric
– Normalized Discounted Cumulated Gain (NDCG@10)
▸ IR system
– Indri 5.4 toolkit
10
ANALYSIS
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 1: Is there a difference in performance between
CV and Tags in retrieval?
▸ Answer
– Tags perform significantly
better than CV
– The combination of both
results in even better
performance than just for
tags, but not significantly so
– Losing tag frequency
information helps rather than
hurts performance (also not
significantly)
12
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 2: Do tags outperform CV because of the so-
called popularity effect?
▸ Answer
– No, there does not seem to be a popularity effect
– Types = unique words in a record
– Tokens = all instances of words in a record
13
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 3: Do Tags and
CV complement or cancel
each other out?
▸ Answer
– Tags and CV
complement each
other: they are
successful on different
sets of requests
– But most zero-difference
requests (74.0%)
actually fail completely!
When and why?
14
REQUESTS – RELEVANCE ASPECTS
▸ What makes a suggested book relevant to the user?
– Distinguish between eight relevance aspects (Reuter, 2007;
Koolen et al., 2015)
16
REQUESTS – RELEVANCE ASPECTS
Aspect Description
% of requests
(N = 87)
Accessibility Language, length, or level of difficulty of a book 9.2 %
Content Topic, plot, genre, style, or comprehensiveness 79.3 %
Engagement
Fit a certain mood or interest, are considered high
quality, or provide a certain reading experience
25.3 %
Familiarity
Similar to known books or related to a previous
experience
47.1 %
Known-item
The user is trying to identify a known book, but cannot
remember the metadata that would locate it
12.6 %
Metadata
With a certain title or by a certain author or publisher, in
a particular format, or certain year
23.0 %
Novelty Unusual or quirky, or containing novel content 3.4 %
Socio-cultural
Related to the user's socio-cultural background or
values; popular or obscure
13.8 %
16
REQUESTS – RELEVANCE ASPECTS
▸ Question 4: What types of book requests are best served
by the Unique tags and CV collections?
▸ Answer
– CV terms show a tendency to work best for requests that
touch upon aspects of engagement
– Other requests are best served by Unique tags
17
REQUESTS – RELEVANCE ASPECTS
0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00
Socio-cultural
(N = 10)
Novelty
(N = 2)
Metadata
(N = 17)
Known-item
(N = 11)
Familiarity
(N = 36)
Engagement
(N = 21)
Content
(N = 63)
Accessibility
(N = 7)
Unique tags
CV
0.0 0.20.1 0.40.3 0.60.5 0.80.7 1.00.9
Socio-cultural
(N = 10)
0.1127
0.0428
Novelty
(N = 2)
0.5304
0.0000
Metadata
(N = 17)
0.2454
0.1259
Known-item
(N = 11)
0.3593
0.1818
Familiarity
(N = 36)
0.1833
0.0701
Engagement
(N = 21)
0.1121
0.1425
Content
(N = 63)
0.1965
0.0821
Accessibility
(N = 7)
0.1235
0.0749
Performance grouped by relevance aspect
NDCG@10
18
REQUESTS – TYPE OF BOOK
▸ Question 5: What types of book requests (fiction or non-
fiction) are best served by Unique tags or CV?
▸ Answer
– Unique tags work significantly better for fiction
– CV work better for non-fiction (but not significantly so)
19
FAILURE ANALYSIS
▸ Question 6: Do failed book search requests fail because of
data sparsity, a lower recall base, or a lack of examples?
▸ Answer
– Neither sparsity nor the size of the recall base are the
reason for retrieval failure
– The number of examples provided by the requester has
significant positive influence on performance
(N = 247)
(N = 87)
(N = 334)
20
FAILURE ANALYSIS
▸ Question 7: Do book search requests fail because of their
relevance aspects?
▸ Answer
– No, relevance
aspects are
distributed equally
for successful &
failed requests
– Only Accessibility-
and Metadata-
related search
requests seem to
fail more often
21
FAILURE ANALYSIS
▸ Question 8: Does the type of book that is being requested
(fiction vs. non-fiction) have an influence on whether
requests succeed or fail?
▸ Answer
– Requests for works of fiction fail significantly more often
22
CONCLUSIONS &
FUTURE WORK
FINDINGS
▸ Tags outperform CV...
– ...probably because their terminology is closer to the user‘s
language (not because of the popularity effect)
▸ Sometimes CV are better, for example, for non-fiction books...
– ...whereas tags are better for fiction and for content-related,
familiarity or known-item searches
▸ We believe that tags are simply better able to match the user‘s
language when looking for books
– Although they are still not that great at it!
– Book search is still hard, especially for fiction books
25
OPEN QUESTIONS
▸ How can book metadata be adapted to be closer to the
vocabulary used in real-world book search requests?
▸ What other aspects (besides type of requested book or
relevance aspect of search request) contribute to request
difficulty?
▸ Our question to you:
– What other questions can we ask of this data?
26
QUESTIONS?
Paper URL: http://bit.ly/iconf2017

Weitere ähnliche Inhalte

Andere mochten auch

RDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interactionRDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interaction
Gordon Dunsire
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
Thomas Meehan
 
Cataloging with RDA: An Overview
Cataloging with RDA: An OverviewCataloging with RDA: An Overview
Cataloging with RDA: An Overview
Emily Nimsakont
 

Andere mochten auch (10)

Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
Subject Headings & Classification, or, Why librarians don't seem to think lik...
Subject Headings & Classification, or, Why librarians don't seem to think lik...Subject Headings & Classification, or, Why librarians don't seem to think lik...
Subject Headings & Classification, or, Why librarians don't seem to think lik...
 
RDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interactionRDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interaction
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
 
BIBFRAME and Moving Away From MARC
BIBFRAME and Moving Away From MARCBIBFRAME and Moving Away From MARC
BIBFRAME and Moving Away From MARC
 
MARC and BIBFRAME
MARC and BIBFRAMEMARC and BIBFRAME
MARC and BIBFRAME
 
Tools of our Trade (RDA, MARC21) 2010-03-15
Tools of our Trade (RDA, MARC21) 2010-03-15Tools of our Trade (RDA, MARC21) 2010-03-15
Tools of our Trade (RDA, MARC21) 2010-03-15
 
RDA and the semantic Web
RDA and the semantic WebRDA and the semantic Web
RDA and the semantic Web
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
Cataloging with RDA: An Overview
Cataloging with RDA: An OverviewCataloging with RDA: An Overview
Cataloging with RDA: An Overview
 

Ähnlich wie An In-depth Analysis of Tags and Controlled Metadata for Book Search

natureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdfnatureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdf
JARYLPILLAZAR1
 
Marketing Research Ch04
Marketing Research Ch04Marketing Research Ch04
Marketing Research Ch04
guestf8364c
 
Questioning Practices And Strategies
Questioning Practices And  StrategiesQuestioning Practices And  Strategies
Questioning Practices And Strategies
robbi makely
 
Arte387 Ch3
Arte387 Ch3Arte387 Ch3
Arte387 Ch3
SCWARTED
 
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TakishaPeck109
 
Essential questions
Essential questionsEssential questions
Essential questions
Carla Piper
 
Questionnaire design dr. s l gupta
Questionnaire design dr. s l guptaQuestionnaire design dr. s l gupta
Questionnaire design dr. s l gupta
Ravindra Sharma
 

Ähnlich wie An In-depth Analysis of Tags and Controlled Metadata for Book Search (20)

Nature of inquiry and research
Nature of inquiry and researchNature of inquiry and research
Nature of inquiry and research
 
natureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdfnatureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdf
 
Marketing Research Ch04
Marketing Research Ch04Marketing Research Ch04
Marketing Research Ch04
 
natureofinquiryandresearch-191011224537.pptx
natureofinquiryandresearch-191011224537.pptxnatureofinquiryandresearch-191011224537.pptx
natureofinquiryandresearch-191011224537.pptx
 
Questioning Practices And Strategies
Questioning Practices And  StrategiesQuestioning Practices And  Strategies
Questioning Practices And Strategies
 
Research questions and hypotheses_Hang_Vietnam
Research questions and hypotheses_Hang_VietnamResearch questions and hypotheses_Hang_Vietnam
Research questions and hypotheses_Hang_Vietnam
 
Identifying and formulating a research question: Ayurveda Perspective
Identifying and formulating a research question: Ayurveda Perspective Identifying and formulating a research question: Ayurveda Perspective
Identifying and formulating a research question: Ayurveda Perspective
 
Classroom Assessment Techniques
Classroom Assessment TechniquesClassroom Assessment Techniques
Classroom Assessment Techniques
 
PPT-Final.pptx
PPT-Final.pptxPPT-Final.pptx
PPT-Final.pptx
 
2-171124011016.pdf
2-171124011016.pdf2-171124011016.pdf
2-171124011016.pdf
 
2. practical research ii nature of inquiry & research
2. practical research ii nature of inquiry & research2. practical research ii nature of inquiry & research
2. practical research ii nature of inquiry & research
 
Arte387 Ch3
Arte387 Ch3Arte387 Ch3
Arte387 Ch3
 
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
 
QUALITATIVE DATA ANALYSIS.ppt
QUALITATIVE DATA ANALYSIS.pptQUALITATIVE DATA ANALYSIS.ppt
QUALITATIVE DATA ANALYSIS.ppt
 
Summary+of+comments+based+on+scoring+on+feb++29+2012
Summary+of+comments+based+on+scoring+on+feb++29+2012Summary+of+comments+based+on+scoring+on+feb++29+2012
Summary+of+comments+based+on+scoring+on+feb++29+2012
 
Search vs Text Classification
Search vs Text ClassificationSearch vs Text Classification
Search vs Text Classification
 
Essential questions
Essential questionsEssential questions
Essential questions
 
Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docx
 
Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docx
 
Questionnaire design dr. s l gupta
Questionnaire design dr. s l guptaQuestionnaire design dr. s l gupta
Questionnaire design dr. s l gupta
 

Mehr von Toine Bogers

Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Toine Bogers
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
Toine Bogers
 

Mehr von Toine Bogers (14)

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven Recommendation
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
 
Measuring System Performance in Cultural Heritage Systems
Measuring System Performance in Cultural Heritage SystemsMeasuring System Performance in Cultural Heritage Systems
Measuring System Performance in Cultural Heritage Systems
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
 

Kürzlich hochgeladen

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 

An In-depth Analysis of Tags and Controlled Metadata for Book Search

  • 1. AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED METADATA FOR BOOK SEARCH TOINE BOGERS VIVIEN PETRAS MARCH 23, 2017iCONFERENCE 2017
  • 2. OUTLINE ▸ Introduction ▸ Methodology & Experimental Setup ▸ Analysis – Tags vs. Controlled Vocabularies – Book Search Requests – Failure Analysis ▸ Conclusions & Future Work 2
  • 4. MOTIVATION ▸ Readers often struggle with existing systems (i.e., library catalogs, Amazon, eBook sellers) to discover new books – Information needs are contextual, personal & complex – Book metadata does not contain the necessary information 4
  • 5. EARLIER WORK ▸ iConference 2015 – Tags outperform controlled vocabularies for search, but sometimes controlled vocabularies are better. – Controlled vocabularies contains more unique terms, tags more repetition of terms. ▸ Why? – Terminology – Popularity / frequency – Type of request 5
  • 6. STUDY OBJECTIVES ▸ Why are tags better than controlled vocabularies for book search? – Which types of book search requests are better addressed using tags and which using CV? – Which book search requests fail completely and what characterizes such requests? 6
  • 8. EXPERIMENTAL SETUP ▸ Controlled Vocabulary content (CV) – DDC class labels – Subjects – Geographic names – Category labels – LCSH terms ▸ Tags – Each tag occurs as many times as it has been assigned by the users ▸ Unique tags – Each tag occurs only once 8
  • 9. AMAZON/LIBRARYTHING COLLECTION 9 Tags Tags Controlled Vocabulary Content (CV) DDC class labels subjects geographic names category labels LCSH terms Unique Tags Unique Tags per record
  • 11. EXPERIMENTAL SETUP ▸ Amazon / LibraryThing collection of book records – 2 million records ▸ LibraryThing forum topics for search requests – 334 search requests for testing ▸ Relevance judgements – Recommendations from LT members with graded relevance scoring (highest relevance if book is added by searcher) ▸ Evaluation metric – Normalized Discounted Cumulated Gain (NDCG@10) ▸ IR system – Indri 5.4 toolkit 10
  • 13. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 1: Is there a difference in performance between CV and Tags in retrieval? ▸ Answer – Tags perform significantly better than CV – The combination of both results in even better performance than just for tags, but not significantly so – Losing tag frequency information helps rather than hurts performance (also not significantly) 12
  • 14. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 2: Do tags outperform CV because of the so- called popularity effect? ▸ Answer – No, there does not seem to be a popularity effect – Types = unique words in a record – Tokens = all instances of words in a record 13
  • 15. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 3: Do Tags and CV complement or cancel each other out? ▸ Answer – Tags and CV complement each other: they are successful on different sets of requests – But most zero-difference requests (74.0%) actually fail completely! When and why? 14
  • 16. REQUESTS – RELEVANCE ASPECTS ▸ What makes a suggested book relevant to the user? – Distinguish between eight relevance aspects (Reuter, 2007; Koolen et al., 2015) 16
  • 17. REQUESTS – RELEVANCE ASPECTS Aspect Description % of requests (N = 87) Accessibility Language, length, or level of difficulty of a book 9.2 % Content Topic, plot, genre, style, or comprehensiveness 79.3 % Engagement Fit a certain mood or interest, are considered high quality, or provide a certain reading experience 25.3 % Familiarity Similar to known books or related to a previous experience 47.1 % Known-item The user is trying to identify a known book, but cannot remember the metadata that would locate it 12.6 % Metadata With a certain title or by a certain author or publisher, in a particular format, or certain year 23.0 % Novelty Unusual or quirky, or containing novel content 3.4 % Socio-cultural Related to the user's socio-cultural background or values; popular or obscure 13.8 % 16
  • 18. REQUESTS – RELEVANCE ASPECTS ▸ Question 4: What types of book requests are best served by the Unique tags and CV collections? ▸ Answer – CV terms show a tendency to work best for requests that touch upon aspects of engagement – Other requests are best served by Unique tags 17
  • 19. REQUESTS – RELEVANCE ASPECTS 0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00 Socio-cultural (N = 10) Novelty (N = 2) Metadata (N = 17) Known-item (N = 11) Familiarity (N = 36) Engagement (N = 21) Content (N = 63) Accessibility (N = 7) Unique tags CV 0.0 0.20.1 0.40.3 0.60.5 0.80.7 1.00.9 Socio-cultural (N = 10) 0.1127 0.0428 Novelty (N = 2) 0.5304 0.0000 Metadata (N = 17) 0.2454 0.1259 Known-item (N = 11) 0.3593 0.1818 Familiarity (N = 36) 0.1833 0.0701 Engagement (N = 21) 0.1121 0.1425 Content (N = 63) 0.1965 0.0821 Accessibility (N = 7) 0.1235 0.0749 Performance grouped by relevance aspect NDCG@10 18
  • 20. REQUESTS – TYPE OF BOOK ▸ Question 5: What types of book requests (fiction or non- fiction) are best served by Unique tags or CV? ▸ Answer – Unique tags work significantly better for fiction – CV work better for non-fiction (but not significantly so) 19
  • 21. FAILURE ANALYSIS ▸ Question 6: Do failed book search requests fail because of data sparsity, a lower recall base, or a lack of examples? ▸ Answer – Neither sparsity nor the size of the recall base are the reason for retrieval failure – The number of examples provided by the requester has significant positive influence on performance (N = 247) (N = 87) (N = 334) 20
  • 22. FAILURE ANALYSIS ▸ Question 7: Do book search requests fail because of their relevance aspects? ▸ Answer – No, relevance aspects are distributed equally for successful & failed requests – Only Accessibility- and Metadata- related search requests seem to fail more often 21
  • 23. FAILURE ANALYSIS ▸ Question 8: Does the type of book that is being requested (fiction vs. non-fiction) have an influence on whether requests succeed or fail? ▸ Answer – Requests for works of fiction fail significantly more often 22
  • 25. FINDINGS ▸ Tags outperform CV... – ...probably because their terminology is closer to the user‘s language (not because of the popularity effect) ▸ Sometimes CV are better, for example, for non-fiction books... – ...whereas tags are better for fiction and for content-related, familiarity or known-item searches ▸ We believe that tags are simply better able to match the user‘s language when looking for books – Although they are still not that great at it! – Book search is still hard, especially for fiction books 25
  • 26. OPEN QUESTIONS ▸ How can book metadata be adapted to be closer to the vocabulary used in real-world book search requests? ▸ What other aspects (besides type of requested book or relevance aspect of search request) contribute to request difficulty? ▸ Our question to you: – What other questions can we ask of this data? 26