SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Penguins in Sweaters,
or Serendipitous Entity Search
on User-generated-Content
I l a r i a B o r d i n o , Ye l e n a M e j o va , a n d M o u n i a L a l m a s
( Ya h o o L a b s )
ACM International Conference on Information and 
Knowledge
Management (CIKM 2013)
O c t o b e r 2 9 th, 2 0 1 3
Why/when do penguins wear sweaters?

Serendipity
Entity
Search

finding something good or useful while not
specifically looking for it, serendipitous search
systems provide relevant and interesting results
we build an entity-driven serendipitous search system
based on enriched entity networks extracted from
Wikipedia and Yahoo! Answers

2
1. What connections between
entities do web community
knowledge portals offer?

2. How do they contribute to an
interesting, serendipitous
browsing experience?

WHAT

WHY

3
Yahoo Answers

vs

community-driven question &
answer portal
 67M questions & 262M
answers
 2 years [2010/2011]
 English-language

minimally curated
opinions, gossip, personal info
variety of points of view

Wikipedia

community-driven
encyclopedia
• 3 795 865 articles
• from end of
December 2011
• English Wikipedia

curated
high-quality knowledge
variety of niche topics
4
Entity & Relationship Extraction
 Entity: any concept having a Wikipedia page
Use an internal tool to
(1) identify surface forms,
(2) resolve to Wikipedia entities,
(3) rank entities using aboutness score;
Relationship: Cosine similarity of tf/idf vectors
(concatenation of documents where entity appears)
W. Zhao, J. Jiang, J. Weng, J. He, E.P. Lim, H. Yan, and X. Li. Comparing twitter and traditional
media using topic models. ECIR 2011.
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure.
CIKM 2009.
5
Dataset Features
Dataset

# Nodes

# Edges

# Isolated

Yahoo! Answers

896,799

112,595,138

69,856

1,754,069

237,058,218

82,381

Wikipedia

 Sentiment
› using SentiStrength compute positive & negative scores
› compute attitude and sentimentality [Kucuktunc’12]
› Entity-level scores
 Topical Category
 Quality
– Yahoo Content Taxonomy
› Flesch Reading Ease score
Attitude (Polarity)

Sentimentality (Strength)

Readability

6
Wikipedia

Yahoo Answers
7
Retrieval
 Algorithm: Lazy Random walk with restart
Justin Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga,
Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert
Pattinson, Adele (singer), Steve Jobs, Osama bin Laden, Ron Paul,
Twitter, Facebook, Netflix, IPad, IPhone, Touchpad, Kindle, Olympic
Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford
Street, Nubcrburgring, Haiti, Chile, Libya, Egypt, Middle East,
Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout,
Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendicitis,
Cholera, Influenza, Pertussis, Vaccine, Childbirth

3 label per query-result pair

Wikipedia

Yahoo! Answers

Combined

Precision @ 5

0.668

0.724

0.744

MAP

0.716

0.762

0.782

 Annotator agreement
(overlap): 0.85
 Average overlap in top 5
results: 12%

Steve Jobs
Yahoo! Answers
Jon Rubinstein
Timothy Cook
Kane Kramer
Steve Wozniak
Jerry York

Wikipedia
System 7
PowerPC G4
SuperDrive
Power Macintosh
Power Computing Corp.
8
Serendipity
“making fortunate discoveries by accident”
Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
Serendipity = interestingness + relevance
Result interestingness given the query
Personal interest in result

M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems
by coverage and serendipity. IRecSys 2010.
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in
web search. SIGCHI 2009.
9
Baseline

Data

General

High Read.

Top: 5 entities that occur most
frequently

WP

0.63 (0.58)

0.56 (0.53)

in the top 5 search results provided by

YA

0.69 (0.63)

0.71 (0.65)

Bing and Google

Comb

0.70 (0.61)

0.68 (0.61)

Top –WP: same as above, but excluding WP

0.63 (0.58)

0.56 (0.54)

the Wikipedia page from the set of

YA

0.70 (0.64)

0.71 (0.66)

results

Comb

0.71 (0.64)

0.68 (0.63)

Rel: top 5 entities in the related query

WP

0.64 (0.61)

0.57 (0.56)

suggestions provided by Bing and
Google

YA

0.70 (0.65)

0.71 (0.66)

Comb

0.72 (0.67)

0.69 (0.65)

WP

0.61 (0.54)

0.55 (0.51)

YA

0.68 (0.57)

0.69 (0.59)

Comb

0.68 (0.55)

0.66 (0.56)

Rel + Top: union of Top and Rel

| relevant & unexpected | / | unexpected |
number of serendipitous results out of all
of the unexpected results retrieved

| relevant & unexpected | / | retrieved |
serendipitous out of all retrieved
10
User-perceived Quality

1. Which result is more relevant to the query?
2. If someone is interested in the query, would they
also be interested in these results?
3. Even if you are not interested in the query, are
these results interesting to you personally?
4. Would you learn anything new about the query?
11
Interestingness
 Labelers provide pairwise comparisons between results;
Combine into a reference ranking and Compare result ranking
to optimal (Kendall’s tau-b)
Agreement:

Relevance (83%), Query interest (81%),
Personal interest (76%), Learning something new (81%)

 Interesting > Relevant
Oil Spill 
Sweaters for Penguins

Robert Pattinson 
Water for Elephants

WP

 Relevant > Interesting

Egypt 
Ptolemaic Kingdom

WP

WP & YA

Egypt  Cairo Conference WP
Netflix  Blu-ray Disc YA

J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating
aggregated search results. ECIR 2011.

12
Data

General +Topic

Which result is more

WP

0.162

0.194

relevant to the query?

YA

0.336

0.374

Comb

0.201

0.222

If someone is interested in

WP

0.162

0.176

the query, would they also

YA

0.312

0.343

be interested in the result?

Comb

0.184

0.222

Even if you are not
interested

WP

0.139

0.144

in the query, is the result

YA

0.324

0.359

interesting to you
personally?

Comb

0.168

0.198

Would you learn anything

WP

0.167

0.164

new about the query from

YA

0.307

0.346

this result?

Comb

0.184

0.203

Similarity (Kendall’s tau-b) between result sets
and reference ranking

Topical
category
constraint
promote results
of same topic
as query entity
Sentiment and
Readability
constraints
hurt performance

13
What did we learn?

1. What connections between entities
do web community knowledge
portals offer?

2. How do they contribute to
an interesting, serendipitous
browsing experience?

≠
ANSWERS

>
ANSWERS

14
15

Yahoo Confidential & Proprietary

Weitere ähnliche Inhalte

Ähnlich wie Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content

Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social InnovationAshwin Ram
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014Lean Analytics
 
Persuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerPersuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerGuido X Jansen
 
Essay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManEssay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManAmy Williams
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - TechnologyEdelman
 
The Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellThe Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellJessica Tams
 
eLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyeLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyKarl Kapp
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AIVerena Rieser
 
NursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationNursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationKarl Kapp
 
Froomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle
 
Monitoring Measuring Social Media
Monitoring Measuring Social MediaMonitoring Measuring Social Media
Monitoring Measuring Social MediaSean Moffitt
 
Social networking for human resources professionals wb
Social networking for human resources professionals wbSocial networking for human resources professionals wb
Social networking for human resources professionals wbTodd Nilson
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You? Anne Adrian
 
RMIT 2013 sm1 evening
RMIT 2013 sm1 eveningRMIT 2013 sm1 evening
RMIT 2013 sm1 eveningDenis Masseni
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignJonathan Stray
 
Ilf fort wayneinnovation
Ilf fort wayneinnovationIlf fort wayneinnovation
Ilf fort wayneinnovationStephen Abram
 
Brazilian digital woman: Permanent Connectivity
Brazilian digital woman: Permanent ConnectivityBrazilian digital woman: Permanent Connectivity
Brazilian digital woman: Permanent ConnectivityThe Cocktail Analysis
 

Ähnlich wie Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content (20)

Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social Innovation
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014
 
Persuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerPersuasive e commerce workshop - Spryker
Persuasive e commerce workshop - Spryker
 
Leading Social Media Strategy
Leading Social Media StrategyLeading Social Media Strategy
Leading Social Media Strategy
 
Essay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManEssay On Good Manners Maketh A Man
Essay On Good Manners Maketh A Man
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology
 
The Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellThe Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian Lovell
 
eLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyeLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or Fishy
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 
NursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationNursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing Education
 
Froomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the Giants
 
Monitoring Measuring Social Media
Monitoring Measuring Social MediaMonitoring Measuring Social Media
Monitoring Measuring Social Media
 
Social networking for human resources professionals wb
Social networking for human resources professionals wbSocial networking for human resources professionals wb
Social networking for human resources professionals wb
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Slalom
SlalomSlalom
Slalom
 
RMIT 2013 sm1 evening
RMIT 2013 sm1 eveningRMIT 2013 sm1 evening
RMIT 2013 sm1 evening
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter Design
 
Ilf fort wayneinnovation
Ilf fort wayneinnovationIlf fort wayneinnovation
Ilf fort wayneinnovation
 
Brazilian digital woman: Permanent Connectivity
Brazilian digital woman: Permanent ConnectivityBrazilian digital woman: Permanent Connectivity
Brazilian digital woman: Permanent Connectivity
 

Mehr von Mounia Lalmas-Roelleke

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Mounia Lalmas-Roelleke
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Mounia Lalmas-Roelleke
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationMounia Lalmas-Roelleke
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceMounia Lalmas-Roelleke
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalMounia Lalmas-Roelleke
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Mounia Lalmas-Roelleke
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersMounia Lalmas-Roelleke
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataMounia Lalmas-Roelleke
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementMounia Lalmas-Roelleke
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experienceMounia Lalmas-Roelleke
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsMounia Lalmas-Roelleke
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisMounia Lalmas-Roelleke
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Mounia Lalmas-Roelleke
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementMounia Lalmas-Roelleke
 

Mehr von Mounia Lalmas-Roelleke (20)

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and Optimization
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
Advertising Quality Science
Advertising Quality ScienceAdvertising Quality Science
Advertising Quality Science
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User Engagement
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experience
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native Advertisements
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival Analysis
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
 

Kürzlich hochgeladen

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content

  • 1. Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content I l a r i a B o r d i n o , Ye l e n a M e j o va , a n d M o u n i a L a l m a s ( Ya h o o L a b s ) ACM International Conference on Information and 
Knowledge Management (CIKM 2013) O c t o b e r 2 9 th, 2 0 1 3
  • 2. Why/when do penguins wear sweaters? Serendipity Entity Search finding something good or useful while not specifically looking for it, serendipitous search systems provide relevant and interesting results we build an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers 2
  • 3. 1. What connections between entities do web community knowledge portals offer? 2. How do they contribute to an interesting, serendipitous browsing experience? WHAT WHY 3
  • 4. Yahoo Answers vs community-driven question & answer portal  67M questions & 262M answers  2 years [2010/2011]  English-language minimally curated opinions, gossip, personal info variety of points of view Wikipedia community-driven encyclopedia • 3 795 865 articles • from end of December 2011 • English Wikipedia curated high-quality knowledge variety of niche topics 4
  • 5. Entity & Relationship Extraction  Entity: any concept having a Wikipedia page Use an internal tool to (1) identify surface forms, (2) resolve to Wikipedia entities, (3) rank entities using aboutness score; Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears) W. Zhao, J. Jiang, J. Weng, J. He, E.P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. ECIR 2011. D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009. 5
  • 6. Dataset Features Dataset # Nodes # Edges # Isolated Yahoo! Answers 896,799 112,595,138 69,856 1,754,069 237,058,218 82,381 Wikipedia  Sentiment › using SentiStrength compute positive & negative scores › compute attitude and sentimentality [Kucuktunc’12] › Entity-level scores  Topical Category  Quality – Yahoo Content Taxonomy › Flesch Reading Ease score Attitude (Polarity) Sentimentality (Strength) Readability 6
  • 8. Retrieval  Algorithm: Lazy Random walk with restart Justin Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga, Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert Pattinson, Adele (singer), Steve Jobs, Osama bin Laden, Ron Paul, Twitter, Facebook, Netflix, IPad, IPhone, Touchpad, Kindle, Olympic Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford Street, Nubcrburgring, Haiti, Chile, Libya, Egypt, Middle East, Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout, Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendicitis, Cholera, Influenza, Pertussis, Vaccine, Childbirth 3 label per query-result pair Wikipedia Yahoo! Answers Combined Precision @ 5 0.668 0.724 0.744 MAP 0.716 0.762 0.782  Annotator agreement (overlap): 0.85  Average overlap in top 5 results: 12% Steve Jobs Yahoo! Answers Jon Rubinstein Timothy Cook Kane Kramer Steve Wozniak Jerry York Wikipedia System 7 PowerPC G4 SuperDrive Power Macintosh Power Computing Corp. 8
  • 9. Serendipity “making fortunate discoveries by accident” Serendipity = unexpectedness + relevance “Expected” result baselines from web search Serendipity = interestingness + relevance Result interestingness given the query Personal interest in result M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010. P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009. 9
  • 10. Baseline Data General High Read. Top: 5 entities that occur most frequently WP 0.63 (0.58) 0.56 (0.53) in the top 5 search results provided by YA 0.69 (0.63) 0.71 (0.65) Bing and Google Comb 0.70 (0.61) 0.68 (0.61) Top –WP: same as above, but excluding WP 0.63 (0.58) 0.56 (0.54) the Wikipedia page from the set of YA 0.70 (0.64) 0.71 (0.66) results Comb 0.71 (0.64) 0.68 (0.63) Rel: top 5 entities in the related query WP 0.64 (0.61) 0.57 (0.56) suggestions provided by Bing and Google YA 0.70 (0.65) 0.71 (0.66) Comb 0.72 (0.67) 0.69 (0.65) WP 0.61 (0.54) 0.55 (0.51) YA 0.68 (0.57) 0.69 (0.59) Comb 0.68 (0.55) 0.66 (0.56) Rel + Top: union of Top and Rel | relevant & unexpected | / | unexpected | number of serendipitous results out of all of the unexpected results retrieved | relevant & unexpected | / | retrieved | serendipitous out of all retrieved 10
  • 11. User-perceived Quality 1. Which result is more relevant to the query? 2. If someone is interested in the query, would they also be interested in these results? 3. Even if you are not interested in the query, are these results interesting to you personally? 4. Would you learn anything new about the query? 11
  • 12. Interestingness  Labelers provide pairwise comparisons between results; Combine into a reference ranking and Compare result ranking to optimal (Kendall’s tau-b) Agreement: Relevance (83%), Query interest (81%), Personal interest (76%), Learning something new (81%)  Interesting > Relevant Oil Spill  Sweaters for Penguins Robert Pattinson  Water for Elephants WP  Relevant > Interesting Egypt  Ptolemaic Kingdom WP WP & YA Egypt  Cairo Conference WP Netflix  Blu-ray Disc YA J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. ECIR 2011. 12
  • 13. Data General +Topic Which result is more WP 0.162 0.194 relevant to the query? YA 0.336 0.374 Comb 0.201 0.222 If someone is interested in WP 0.162 0.176 the query, would they also YA 0.312 0.343 be interested in the result? Comb 0.184 0.222 Even if you are not interested WP 0.139 0.144 in the query, is the result YA 0.324 0.359 interesting to you personally? Comb 0.168 0.198 Would you learn anything WP 0.167 0.164 new about the query from YA 0.307 0.346 this result? Comb 0.184 0.203 Similarity (Kendall’s tau-b) between result sets and reference ranking Topical category constraint promote results of same topic as query entity Sentiment and Readability constraints hurt performance 13
  • 14. What did we learn? 1. What connections between entities do web community knowledge portals offer? 2. How do they contribute to an interesting, serendipitous browsing experience? ≠ ANSWERS > ANSWERS 14
  • 15. 15 Yahoo Confidential & Proprietary