Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Simple Yet Effective Methods for Large-Scale
Scholarly Publication Ranking
KMi and Mendeley (team BletchleyPark) at WSDM Cu...
Our approach
• Hypothesis
• the importance of a publication can be determined by a
mixture of factors evidencing its impac...
Our approach
• Method
1 separately score each of the types of entities in the graph
2 use the separate scores to provide a...
Publication-based scoring functions
score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr +
1.0 · sauth + 0.1 · svenue + 0.01 · s...
Publication-based scoring functions
• Scoring publication entities directly (without considering the
importance of authors...
Publication-based scoring functions
• To account for publication age we added a score based on age:
sage(p) = yp (3)
• In ...
Author-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth +0.1 · svenue + 0.01 · sinst
(5)
7 / 17
Author-based score
• We’ve experimented with some commonly used methods for
evaluating author performance (number of citat...
Venue-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(7)
9 / 17
Venue-based score
• Standard metric in this area is the JIF, alternatives – Scimago
Journal Rank, Eigenfactor
• We have ex...
Institution-based score
score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+
1.0 · sauth + 0.1 · svenue + 0.01 · sinst
(9)
11 / ...
Institution-based score
• Simple approach similar to author- and venue-based scores:
sinst(p) =
i∈Ip x∈Pi,x=p c(x)
|Ip|
(1...
Potential improvements
• Better utilisation of the citation network
• Inclusion of additional data sources
• Possibility t...
What have we learned?
• We found simple citation counts to perform best, but (!):
• In order to develop more optimal ranki...
Alternative ranking methods
• We’ve explored several external datasources
• Motivation – utilising new altmetric and webom...
Alternative ranking methods
• Our main interest is in full-text and the set of metrics referred
to as Semantometrics
• Sem...
Thank you for listening!
• Sources
• https://github.com/damirah/wsdm_cup
• Workshop on Mining Scientific Publications
• htt...
Nächste SlideShare
Wird geladen in …5
×

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

459 Aufrufe

Veröffentlicht am

KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

  1. 1. Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking KMi and Mendeley (team BletchleyPark) at WSDM Cup 2016 Drahomira Herrmannova & Petr Knoth The Open University & Mendeley WSDM Cup 2016, February 2016 1 / 17
  2. 2. Our approach • Hypothesis • the importance of a publication can be determined by a mixture of factors evidencing its impact and the importance of entities which participated in the publication’s creation 2 / 17
  3. 3. Our approach • Method 1 separately score each of the types of entities in the graph 2 use the separate scores to provide a publication score 3 we obtain several different scores for the publication entities 4 final score, which defines publication’s rank, is calculated using linear combination of the scores • Weights obtained experimentally • The final equation score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (1) 3 / 17
  4. 4. Publication-based scoring functions score(p) = 2.5 · spub + 0.1 · sage + 1.0 · spr + 1.0 · sauth + 0.1 · svenue + 0.01 · sinst 4 / 17
  5. 5. Publication-based scoring functions • Scoring publication entities directly (without considering the importance of authors or venues) • We have experimented with several options of normalising and weighting publication citations • Applying a time decay to citations • Applying a decay function to total citation counts • Using mean citation counts • Final scoring function: spub(p) = c(p)/|Ap|, for c(p) ≤ t t/|Ap|, for c(p) > t (2) 5 / 17
  6. 6. Publication-based scoring functions • To account for publication age we added a score based on age: sage(p) = yp (3) • In the second phase of the challenge we have included PageRank as an additional feature: spr(p) = PR(p) (4) 6 / 17
  7. 7. Author-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth +0.1 · svenue + 0.01 · sinst (5) 7 / 17
  8. 8. Author-based score • We’ve experimented with some commonly used methods for evaluating author performance (number of citations, h-index) • We calculated the given value and each of the authors of a publication and tested scoring publications using maximum, total and mean of these values • Final scoring function uses mean citation score per publication and author: sauth(p) = a∈Ap x∈Pa c(x) |Pa| |Ap| (6) 8 / 17
  9. 9. Venue-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (7) 9 / 17
  10. 10. Venue-based score • Standard metric in this area is the JIF, alternatives – Scimago Journal Rank, Eigenfactor • We have experimented with few simple scoring functions (JIF, total citation counts, ...) • Final venue-based score: svenue(p) = x∈Pv,x=p c(x) (8) 10 / 17
  11. 11. Institution-based score score(p) =2.5 · spub + 0.1 · sage + 1.0 · spr+ 1.0 · sauth + 0.1 · svenue + 0.01 · sinst (9) 11 / 17
  12. 12. Institution-based score • Simple approach similar to author- and venue-based scores: sinst(p) = i∈Ip x∈Pi,x=p c(x) |Ip| (10) 12 / 17
  13. 13. Potential improvements • Better utilisation of the citation network • Inclusion of additional data sources • Possibility to analyse the evaluation data and metric • Revise the maximum citation threshold used in the spub score 13 / 17
  14. 14. What have we learned? • We found simple citation counts to perform best, but (!): • In order to develop more optimal ranking method, it is crucial to better understand the evaluation data and method • Citation counting does not account for many characteristics of citations (differences in their meaning, popularity of certain topics and types of research papers, ...) 14 / 17
  15. 15. Alternative ranking methods • We’ve explored several external datasources • Motivation – utilising new altmetric and webometric datasources • Early availability of the data compared compared to citations • Broader view of publication’s impact 15 / 17
  16. 16. Alternative ranking methods • Our main interest is in full-text and the set of metrics referred to as Semantometrics • Semantometrics build on the premise the manuscript of the publication is needed to assess its value (in contrast to utilising external data) • Biggest problem – obtaining the full-texts due to copyright restrictions and paywalls • We’re experimenting with enriching the MAG with the publication full-texts • Enriching MAG with altmetric, webometric and semantometric data would enable developing and testing fundamentally new metrics 16 / 17
  17. 17. Thank you for listening! • Sources • https://github.com/damirah/wsdm_cup • Workshop on Mining Scientific Publications • http://wosp.core.ac.uk/jcdl2016/ • Submission deadline – 17th April 17 / 17

×