Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Known-Item Search
Matthias Hagen
Bauhaus-Universit¨at Weimar
matthias.hagen@uni-weimar.de
@matthias_hagen
B-S-S Anniversar...
The scenario
Matthias Hagen Known-Item Search 2
This is not just a problem of philosoraptor!
Matthias Hagen Known-Item Search 3
Known-item search
Re-finding previously
seen/heard items like
Documents
Websites
Emails
Tweets
Movies
Music
Books
TV
Matthi...
Known-item search
Re-finding previously
seen/heard items like
Documents
Websites
Emails
Tweets
Movies
Music
Books
TV
Remark...
Problem
How do users search for known items?
Matthias Hagen Known-Item Search 5
Studies on re-finding known items
Web search [Sadeghi et al., ECIR 2015]
[Tyler and Teevan, WSDM 2010]
[Edar at al., CHI 20...
Studies on re-finding known items
Web search [Sadeghi et al., ECIR 2015]
[Tyler and Teevan, WSDM 2010]
[Edar at al., CHI 20...
Exceptions: Known-item query generation
Automatic extraction
1 Select some document
2 Draw most discriminative terms
3 Add...
Exceptions: Known-item query generation
Automatic extraction
1 Select some document
2 Draw most discriminative terms
3 Add...
Human memory: Not perfect but also not random
Matthias Hagen Known-Item Search 8
Reasons for memory failure?
Matthias Hagen Known-Item Search 9
Reasons for memory failure? Psychology!
Matthias Hagen Known-Item Search 9
Our goal
A large corpus of difficult and realistic known-item needs.
Matthias Hagen Known-Item Search 10
Our goal
A large corpus of difficult and realistic known-item needs.
Remark: Freely available!
Matthias Hagen Known-Item Sea...
The general idea [Hauff et al., IIiX 2012]
1 Fetch known-item questions from Yahoo! Answers
To ensure realistic human infor...
Question acquisition
Querying Yahoo! Answers API:
forgot AND name AND film
forgot AND title AND song
remember AND title AN...
Question acquisition
Querying Yahoo! Answers API:
forgot AND name AND film
forgot AND title AND song
remember AND title AN...
Corpus cleansing
Answered status
Keep when best answer selected by asker
8,825 questions remain (only about 36% of origina...
Corpus cleansing
Answered status
Keep when best answer selected by asker
8,825 questions remain (only about 36% of origina...
ClueWeb09 coverage
Over the years
Question from 2006 2007 2008 2009 2010 2011 2012
Our dataset 68 176 369 701 578 477 364
...
Corpus analysis
Initial observation
Matthias Hagen Known-Item Search 15
False memories hinder total recall
Matthias Hagen Known-Item Search 16
False memories in questions
Matthias Hagen Known-Item Search 17
Movie“. . . starts off with a box full of free puppies . . . ”
Question
Matthias Hagen Known-Item Search 18
Movie“. . . starts off with a box full of free puppies . . . ”
Question Actual known item
Note a difference?!
Matthias Hagen...
False memories in questions
Matthias Hagen Known-Item Search 19
Movie“. . . Morgan Freeman offers him a job to kill . . . ”
Question
Matthias Hagen Known-Item Search 20
Movie“. . . Morgan Freeman offers him a job to kill . . . ”
Question Actual known item
Note a difference?!
Matthias Hagen Kn...
Funny! But these are just a few outliers?!
Matthias Hagen Known-Item Search 21
False memories statistics
At least 240 questions (9% of corpus) contain false memories
Most frequent false memories: Perso...
False memories statistics
At least 240 questions (9% of corpus) contain false memories
Most frequent false memories: Perso...
Potential usage of the corpus
Observation: False memories hinder good results.
Might even yield zero-result lists!
Retriev...
Potential usage of the corpus
Observation: False memories hinder good results.
Might even yield zero-result lists!
Retriev...
Other fields: False memory implantation
Remark: We are not working on that!
Matthias Hagen Known-Item Search 24
A little scary?!
Matthias Hagen Known-Item Search 25
Let’s finish the talk in a better mood!
Matthias Hagen Known-Item Search 26
You know this song?!
Matthias Hagen Known-Item Search 27
One more hint needed?!
Matthias Hagen Known-Item Search 28
Yes, the Bee Gees!
Ah, ha, ha, ha, steak and a knife, steak and a knife
Matthias Hagen Known-Item Search 29
Some funny false memories really are Mondegreens.
Matthias Hagen Known-Item Search 30
Some funny false memories really are Mondegreens.
. . . that are misheard lyrics.
Matthias Hagen Known-Item Search 30
Almost the end: The take-home messages!
Matthias Hagen Known-Item Search 31
What we have done
Results
2,755 known-item questions
Posted by real human users
Linked to the ClueWeb09
False memories ann...
What we have (not) done
Results
2,755 known-item questions
Posted by real human users
Linked to the ClueWeb09
False memori...
What we have (not) done
Results
2,755 known-item questions
Posted by real human users
Linked to the ClueWeb09
False memori...
Nächste SlideShare
Wird geladen in …5
×

Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

415 Aufrufe

Veröffentlicht am

Anlässlich unseres 15 jährigen Firmenjubiläums fand am 17. September 2015 ein exklusives B-S-S Digital Workplace Summit statt - mit praxisnahen Berichten und hochkarätigen Speakern zum Thema "Digitaler Arbeitsplatz - wandelnde Werte und intelligente Systeme".
http://www.b-s-s.de/unternehmen/digitaler-arbeitsplatz-wandelnde-werte-und-intelligente-systeme

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

  1. 1. Known-Item Search Matthias Hagen Bauhaus-Universit¨at Weimar matthias.hagen@uni-weimar.de @matthias_hagen B-S-S Anniversary Eisenach September 16, 2015 Matthias Hagen Known-Item Search 1
  2. 2. The scenario Matthias Hagen Known-Item Search 2
  3. 3. This is not just a problem of philosoraptor! Matthias Hagen Known-Item Search 3
  4. 4. Known-item search Re-finding previously seen/heard items like Documents Websites Emails Tweets Movies Music Books TV Matthias Hagen Known-Item Search 4
  5. 5. Known-item search Re-finding previously seen/heard items like Documents Websites Emails Tweets Movies Music Books TV Remarks: Users have some knowledge about their need. Only very few relevant documents out there. Matthias Hagen Known-Item Search 4
  6. 6. Problem How do users search for known items? Matthias Hagen Known-Item Search 5
  7. 7. Studies on re-finding known items Web search [Sadeghi et al., ECIR 2015] [Tyler and Teevan, WSDM 2010] [Edar at al., CHI 2008] [Azzopardi et al., SIGIR 2007] [Teevan, TOIS 2008, UIST 2007] [Beitzel et al., SIGIR 2003] Twitter search [Meier and Elsweiler, IIiX 2014] Email search [Elsweiler et al., SIGIR 2011, ECIR 2011, TOIS 2008] PIM [Kim and Croft, SIGIR 2010, CIKM 2009] [Kelly et al., IIiX 2008] [Blanc-Brude and Scapin, IUI 2007] [Boardman and Sasse, CHI 2004] [Dumais et al., SIGIR 2003] [Barreau and Nardi, SIGCHI Bulletin 1995] Matthias Hagen Known-Item Search 6
  8. 8. Studies on re-finding known items Web search [Sadeghi et al., ECIR 2015] [Tyler and Teevan, WSDM 2010] [Edar at al., CHI 2008] [Azzopardi et al., SIGIR 2007] [Teevan, TOIS 2008, UIST 2007] [Beitzel et al., SIGIR 2003] Twitter search [Meier and Elsweiler, IIiX 2014] Email search [Elsweiler et al., SIGIR 2011, ECIR 2011, TOIS 2008] PIM [Kim and Croft, SIGIR 2010, CIKM 2009] [Kelly et al., IIiX 2008] [Blanc-Brude and Scapin, IUI 2007] [Boardman and Sasse, CHI 2004] [Dumais et al., SIGIR 2003] [Barreau and Nardi, SIGCHI Bulletin 1995] Problem: Most corpora and queries not freely available. Matthias Hagen Known-Item Search 6
  9. 9. Exceptions: Known-item query generation Automatic extraction 1 Select some document 2 Draw most discriminative terms 3 Add random noise Web [Azzopardi et al., SIGIR 2007] PIM [Kim and Croft, CIKM 2009] Email [Elsweiler et al., SIGIR 2011] Human computation game 1 Select some document 2 Show it to a user for some time 3 Ask for a query retrieving it top-ranked PIM [Kim and Croft, SIGIR 2010] Matthias Hagen Known-Item Search 7
  10. 10. Exceptions: Known-item query generation Automatic extraction 1 Select some document 2 Draw most discriminative terms 3 Add random noise Web [Azzopardi et al., SIGIR 2007] PIM [Kim and Croft, CIKM 2009] Email [Elsweiler et al., SIGIR 2011] Human computation game 1 Select some document 2 Show it to a user for some time 3 Ask for a query retrieving it top-ranked PIM [Kim and Croft, SIGIR 2010] Problem: Not really“natural”settings. Matthias Hagen Known-Item Search 7
  11. 11. Human memory: Not perfect but also not random Matthias Hagen Known-Item Search 8
  12. 12. Reasons for memory failure? Matthias Hagen Known-Item Search 9
  13. 13. Reasons for memory failure? Psychology! Matthias Hagen Known-Item Search 9
  14. 14. Our goal A large corpus of difficult and realistic known-item needs. Matthias Hagen Known-Item Search 10
  15. 15. Our goal A large corpus of difficult and realistic known-item needs. Remark: Freely available! Matthias Hagen Known-Item Search 10
  16. 16. The general idea [Hauff et al., IIiX 2012] 1 Fetch known-item questions from Yahoo! Answers To ensure realistic human information needs Websites, movies, music, books, TV series 2 Link questions to a large static web crawl Environment for repeatable research ClueWeb09 chosen 3 Construct queries from questions Maybe via crowdsourcing Not part of this paper Matthias Hagen Known-Item Search 11
  17. 17. Question acquisition Querying Yahoo! Answers API: forgot AND name AND film forgot AND title AND song remember AND title AND movie forgot AND url AND (website OR (web site)) (remember OR forgot) AND (name OR title) AND book 37 such queries in total 24,765 answered questions returned Matthias Hagen Known-Item Search 12
  18. 18. Question acquisition Querying Yahoo! Answers API: forgot AND name AND film forgot AND title AND song remember AND title AND movie forgot AND url AND (website OR (web site)) (remember OR forgot) AND (name OR title) AND book 37 such queries in total 24,765 answered questions returned Problems: Not all questions are really“answered.” Not all questions are known-item intents. Not all questions are linkable to the ClueWeb09. Matthias Hagen Known-Item Search 12
  19. 19. Corpus cleansing Answered status Keep when best answer selected by asker 8,825 questions remain (only about 36% of original crawl) Known-item status and ClueWeb linkage need manual assessment Two independent annotators About 400 hours of work 3,406 questions with known-item information need 2,755 can be linked to ClueWeb09 documents Only these form our dataset Matthias Hagen Known-Item Search 13
  20. 20. Corpus cleansing Answered status Keep when best answer selected by asker 8,825 questions remain (only about 36% of original crawl) Known-item status and ClueWeb linkage need manual assessment Two independent annotators About 400 hours of work 3,406 questions with known-item information need 2,755 can be linked to ClueWeb09 documents Only these form our dataset Problem: Hardly any website questions remained. Matthias Hagen Known-Item Search 13
  21. 21. ClueWeb09 coverage Over the years Question from 2006 2007 2008 2009 2010 2011 2012 Our dataset 68 176 369 701 578 477 364 Coverage 89.5% 92.2% 86.0% 86.2% 79.6% 77.3% 71.9% Type of associated URL 95% Wikipedia 5% other Matthias Hagen Known-Item Search 14
  22. 22. Corpus analysis Initial observation Matthias Hagen Known-Item Search 15
  23. 23. False memories hinder total recall Matthias Hagen Known-Item Search 16
  24. 24. False memories in questions Matthias Hagen Known-Item Search 17
  25. 25. Movie“. . . starts off with a box full of free puppies . . . ” Question Matthias Hagen Known-Item Search 18
  26. 26. Movie“. . . starts off with a box full of free puppies . . . ” Question Actual known item Note a difference?! Matthias Hagen Known-Item Search 18
  27. 27. False memories in questions Matthias Hagen Known-Item Search 19
  28. 28. Movie“. . . Morgan Freeman offers him a job to kill . . . ” Question Matthias Hagen Known-Item Search 20
  29. 29. Movie“. . . Morgan Freeman offers him a job to kill . . . ” Question Actual known item Note a difference?! Matthias Hagen Known-Item Search 20
  30. 30. Funny! But these are just a few outliers?! Matthias Hagen Known-Item Search 21
  31. 31. False memories statistics At least 240 questions (9% of corpus) contain false memories Most frequent false memories: Person names! Matthias Hagen Known-Item Search 22
  32. 32. False memories statistics At least 240 questions (9% of corpus) contain false memories Most frequent false memories: Person names! Remark: Makes me think . . . Does my mail search take this into account? Matthias Hagen Known-Item Search 22
  33. 33. Potential usage of the corpus Observation: False memories hinder good results. Might even yield zero-result lists! Retrieval systems should Detect false memory situations “Repair”the query Leave out the false memory or Replace it with correction Matthias Hagen Known-Item Search 23
  34. 34. Potential usage of the corpus Observation: False memories hinder good results. Might even yield zero-result lists! Retrieval systems should Detect false memory situations “Repair”the query Leave out the false memory or Replace it with correction Our corpus might be a starting point in that direction. Matthias Hagen Known-Item Search 23
  35. 35. Other fields: False memory implantation Remark: We are not working on that! Matthias Hagen Known-Item Search 24
  36. 36. A little scary?! Matthias Hagen Known-Item Search 25
  37. 37. Let’s finish the talk in a better mood! Matthias Hagen Known-Item Search 26
  38. 38. You know this song?! Matthias Hagen Known-Item Search 27
  39. 39. One more hint needed?! Matthias Hagen Known-Item Search 28
  40. 40. Yes, the Bee Gees! Ah, ha, ha, ha, steak and a knife, steak and a knife Matthias Hagen Known-Item Search 29
  41. 41. Some funny false memories really are Mondegreens. Matthias Hagen Known-Item Search 30
  42. 42. Some funny false memories really are Mondegreens. . . . that are misheard lyrics. Matthias Hagen Known-Item Search 30
  43. 43. Almost the end: The take-home messages! Matthias Hagen Known-Item Search 31
  44. 44. What we have done Results 2,755 known-item questions Posted by real human users Linked to the ClueWeb09 False memories annotated Often refer to persons Or song lyrics Future Work Enlarge the corpus Website known-items esp. Web queries for the questions False memory detection Matthias Hagen Known-Item Search 32
  45. 45. What we have (not) done Results 2,755 known-item questions Posted by real human users Linked to the ClueWeb09 False memories annotated Often refer to persons Or song lyrics Future Work Enlarge the corpus Website known-items esp. Web queries for the questions False memory detection Matthias Hagen Known-Item Search 32
  46. 46. What we have (not) done Results 2,755 known-item questions Posted by real human users Linked to the ClueWeb09 False memories annotated Often refer to persons Or song lyrics Future Work Enlarge the corpus Website known-items esp. Web queries for the questions False memory detection Thank you Matthias Hagen Known-Item Search 32

×