Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Using	  SKOS	  Vocabularies	  for	  Improving	  Web	  Search	  Web	  of	  Linked	  En>>es	  Workshop	  (WoLE	  2013)	  WWW...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Lucene-­‐SKOS	  •  Evalua>on	  •  Co...
What	  is	  SKOS?	  •  A	  language	  for	  describing	  Web	  vocabularies	  (taxonomies,	  classifica>on	  schemes,	  the...
Who	  is	  using	  SKOS?	  h"p://www.w3.org/2001/sw/wiki/SKOS/Datasets	  4	  
SKOS	  Example	  -­‐	  UKAT	  "Weapons"skos:prefLabelskos:Conceptukat:859"Military Equipment"skos:prefLabelskos:Conceptuka...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Lucene-­‐SKOS	  •  Evalua>on	  •  Co...
The	  big	  picture	  Retrieval ModelQueryDocumentsAnalysisAnalysisQueryRepresentationDocumentRepresentationScoring Result...
Label-­‐based	  (Query)	  Expansion	  •  Query:	  roman	  arms	  •  Document:	  Title	   Spearhead	  Descrip>on	   Roman	 ...
Label-­‐based	  (Query)	  Expansion	  roman armsskos:prefLabelweaponsskos:altLabelarmamentsskos:broaderordnanceskos:broade...
URI-­‐based	  Expansion	  •  Query:	  roman	  arms	  •  Document:	  Title	   Spearhead	  Descrip>on	   Roman	  iron	  spea...
URI-­‐based	  Expansion	  11	  
URI-­‐based	  Expansion	  matches	  Title	   Spearhead	  Descrip>on	   Roman	  iron	  spearhead.	  The	  spearhead	  was	 ...
Scoring	  •  Apply	  regular	  text	  retrieval	  func>ons	  •  boostctype:	  leverages	  explicit	  declara>on	  of	  SKO...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
h"ps://github.com/behas/lucene-­‐skos	  15	  
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
Dataset	  1	  •  OHSUMED:	  – 350K	  Pubmed	  metadata	  records	  from	  270	  journals	  – Title,	  author,	  abstract,	...
Dataset	  2	  •  8,905	  MODS	  metadata	  records	  harvested	  from	  the	  Library	  of	  Congress	  (LoC)	  •  10	  qu...
Method	  •  Focus	  on	  label-­‐based	  expansion	  at	  query	  >me	  •  Normalized	  queries	  and	  SKOS	  vocabularie...
Two	  Baselines	  •  No	  term	  expansion	  (NoExp)	  – queries	  are	  executed	  over	  documents	  without	  SKOS-­‐ba...
OHSUMED	  query	  expansion	  example	  Query:	  fibromyalgia fibrositis, diagnosis and treatment	  Expanded	  Query:	  (f...
Results	  OHSUMED/MESH	  LTC TF-IDFP@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@10 PNoExp 0.333 0.302 0.260 0.407 0.376 0.356 0.3PRF 0....
Ini>al	  Results	  LoC/LCSH	  .fsh.m-ee-ede.--fPRF might lead to better recall i.e., return more relevantdocuments, but hu...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
Conclusions	  •  Our	  experiments	  indicated	  gains	  in	  retrieval	  effec>veness	  compared	  to	  no-­‐expansion	  o...
Future	  Work	  •  More	  thorough	  evalua>on	  using	  LoC/LCSH	  	  •  Use	  corpora,	  queries,	  and	  vocabularies	 ...
Thanks!	  @bhaslhofer,	  @flaviomar>ns	  hbp://slideshare.net/bhaslhofer	  	  hbps://github.com/behas/lucene-­‐skos	  27	  
Nächste SlideShare
Wird geladen in …5
×

Using SKOS Vocabularies for Improving Web Search

1.980 Aufrufe

Veröffentlicht am

WoLE 2013 talk

Veröffentlicht in: Technologie, Bildung
  • Als Erste(r) kommentieren

Using SKOS Vocabularies for Improving Web Search

  1. 1. Using  SKOS  Vocabularies  for  Improving  Web  Search  Web  of  Linked  En>>es  Workshop  (WoLE  2013)  WWW  2013,  Rio  de  Janeiro,  May  13th  2013    Bernhard  Haslhofer  |  University  of  Vienna  Flávio  Mar>ns  |  Universidade  Nova  de  Lisboa  João  Magalhães  |  Universidade  Nova  de  Lisboa  
  2. 2. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  2  
  3. 3. What  is  SKOS?  •  A  language  for  describing  Web  vocabularies  (taxonomies,  classifica>on  schemes,  thesauri)  •  Builds  on  Linked  Data  principles  – Concepts  have  URIs    – Concepts  are  interlinked  – Vocabularies  are  expressed  in  RDF  3  
  4. 4. Who  is  using  SKOS?  h"p://www.w3.org/2001/sw/wiki/SKOS/Datasets  4  
  5. 5. SKOS  Example  -­‐  UKAT  "Weapons"skos:prefLabelskos:Conceptukat:859"Military Equipment"skos:prefLabelskos:Conceptukat:5060skos:broaderskos:narrowerskos:broaderskos:narrower"Ordnance"skos:altLabel"Armaments"skos:altLabel"Arms"skos:altLabelukat: http://www.ukat.org.uk/thesaurus/concept/skos: http://www.w3.org/2004/02/skos/core# 5  
  6. 6. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  6  
  7. 7. The  big  picture  Retrieval ModelQueryDocumentsAnalysisAnalysisQueryRepresentationDocumentRepresentationScoring ResultsSKOS-­‐based  term  expansion  7  
  8. 8. Label-­‐based  (Query)  Expansion  •  Query:  roman  arms  •  Document:  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   Weapons  8  
  9. 9. Label-­‐based  (Query)  Expansion  roman armsskos:prefLabelweaponsskos:altLabelarmamentsskos:broaderordnanceskos:broadermilitary equipmentTitle   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   Weapons  matches  9  
  10. 10. URI-­‐based  Expansion  •  Query:  roman  arms  •  Document:  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   hbp://www.ukat.org.uk/thesaurus/concept/859  10  
  11. 11. URI-­‐based  Expansion  11  
  12. 12. URI-­‐based  Expansion  matches  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   hbp://www.ukat.org.uk/thesaurus/concept/859  arms,  weapons,  armaments,  ...  Query:  roman  arms  12  
  13. 13. Scoring  •  Apply  regular  text  retrieval  func>ons  •  boostctype:  leverages  explicit  declara>on  of  SKOS  expansion  types  in  term  representa>ons  •  coordq,d:  ensures  that  a  document  with  more  matching  terms  will  score  higher  13  
  14. 14. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  14  
  15. 15. h"ps://github.com/behas/lucene-­‐skos  15  
  16. 16. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  16  
  17. 17. Dataset  1  •  OHSUMED:  – 350K  Pubmed  metadata  records  from  270  journals  – Title,  author,  abstract,  …  – 3  level  relevance  judgments  for  informa>on  needs  •  Medical  Subject  Headings  (MeSH)  in  SKOS  – Maintained  by  US  Na>onal  Library  of  Medicine  – Used  to  index  millions  of  ar>cles  in  PubMed    17  
  18. 18. Dataset  2  •  8,905  MODS  metadata  records  harvested  from  the  Library  of  Congress  (LoC)  •  10  queries  from  the  2009/2010  query  collec>on  •  Binary  relevance  judgments  for  queries    •  Library  of  Congress  Subject  Headings  (LCSH)  in  SKOS  18  
  19. 19. Method  •  Focus  on  label-­‐based  expansion  at  query  >me  •  Normalized  queries  and  SKOS  vocabularies  •  Expanded  each  query  term  by  collec>ng  SKOS  concepts  that  have  that  term  in  any  of  their  labels  (prefLabel,  altLabel,  hiddenLabel)  •  Applied  pre-­‐defined  boost-­‐weights  that  maximized  query  performance  19  
  20. 20. Two  Baselines  •  No  term  expansion  (NoExp)  – queries  are  executed  over  documents  without  SKOS-­‐based  term  expansion.  •  Pseudo  Relevance  Feedback  (PRF):  – Perform  ini>al  search  with  original  query  – Collect  terms  from  k  retrieved  documents  – Resubmit  query  +  collected  terms  20  
  21. 21. OHSUMED  query  expansion  example  Query:  fibromyalgia fibrositis, diagnosis and treatment  Expanded  Query:  (fibromyalgia rheumatism muscular^0.5 diffuse myofascial pain syndrome^0.5fibromyalgia fibromyositis syndrome^0.5 myofascial pain syndrome diffuse^0.5fibromyositis fibromyalgia syndrome^0.5 fibrositis^0.5)  (fibrositis rheumatism muscular^0.5 diffuse myofascial pain syndrome^0.5fibromyalgia fibromyositis syndrome^0.5 myofascial pain syndrome diffuse^0.5fibromyositis fibromyalgia syndrome^0.5 fibromyalgia^0.5)  (diagnosis examinations diagnoses^0.5)  (treatment disease management^0.5 therapy^0.5)  21  
  22. 22. Results  OHSUMED/MESH  LTC TF-IDFP@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@10 PNoExp 0.333 0.302 0.260 0.407 0.376 0.356 0.3PRF 0.322 0.282 0.302 0.379 0.360 0.393 0.3SKOS 0.419 0.366 0.276 0.484 0.429 0.379 0.5Table 1: Precision and nDCG results onquery set provided with each dataset (63 OHSUMED/MESH,10 LoC/LCSH). We focused on two measures, precision atrank n (P@n) for both dataset bundles and nDCG at rank n(nDCG@n) for the OHSUMED/MESH bundle, which pro-vides ordinal relevance judgments. Two retrieval modelswere used for ranking documents: LTC TF-IDF and BM25.4.1 Dataset Characteristicsonlyoutco25 mwithoapartPRFdocumBM25nDCG@10 P@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@100.356 0.381 0.344 0.265 0.450 0.414 0.3740.393 0.377 0.317 0.275 0.443 0.397 0.3690.379 0.500 0.366 0.282 0.548 0.435 0.397CG results on the OHSUMED dataset.22  
  23. 23. Ini>al  Results  LoC/LCSH  .fsh.m-ee-ede.--fPRF might lead to better recall i.e., return more relevantdocuments, but hurt precison at top ranks. On the otherhand, with SKOS-expansion the terms used in expansionare always from a SKOS vocabulary and not from the cor-pus. Therefore, we are already only expanding using rele-vant terms, since the terms’ existence in the vocabulary is agood hint that the term is important.LTC TF-IDFP@1 P@3 P@10NoExp 0.500 0.500 0.370PRF 0.300 0.433 0.430SKOS 0.600 0.533 0.370Table 2: Precision with LTC TF-IDF results on theLibrary of Congress dataset.Table 2 shows the results we obtained from early experi-ments with the Library of Congress Subject Headings. Theresults are similar to the results with OHSUMED. The PRF23  
  24. 24. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Experiments  •  Conclusions  24  
  25. 25. Conclusions  •  Our  experiments  indicated  gains  in  retrieval  effec>veness  compared  to  no-­‐expansion  or  pseudo  relevance  feedback  •  Our  solu>on  can  easily  be  adopted  by  loading  lucene-­‐skos  with  Apache  Lucene  and  Solr  25  
  26. 26. Future  Work  •  More  thorough  evalua>on  using  LoC/LCSH    •  Use  corpora,  queries,  and  vocabularies  from  other  domains  •  Apply  this  technique  for  general  concept-­‐based  and  URI-­‐iden>fied  Web  data  sources  26  
  27. 27. Thanks!  @bhaslhofer,  @flaviomar>ns  hbp://slideshare.net/bhaslhofer    hbps://github.com/behas/lucene-­‐skos  27  

×