This study explores the readerships in Mendeley across 5 major fields of science in Leiden Ranking 2013 for a data set of 1,107,917 Web of Science (WoS) publications (reviews and articles) from all disciplines published in 2011 with DOI available. The main objective is to know if there are different patterns in terms of readership and citation impact depending on the different ‘Academic Status’ of Mendeley readers. In case of finding different pattern, this could help to introduce the possibility of considering the different users as potential predicting elements of citations.
The current study is built upon the previous study of analyzing Mendeley users with focus on the types of the different Mendeley users (known users) in order to explore their patterns of saving publications in terms of subject fields, citation and readership impact. Particular attention has been paid to the extent to which the readerships of the publications saved by the different types of users in Mendeley correlate with their citation indicators and across 5 major fields of science in the Leiden Ranking (LR); also, the potential of identifying highly cited papers by different user types in Mendeley has been investigated. For this reason, we present an exploratory analysis of the patterns of reading of the different types of users in Mendeley and we study their relationship with citations and across LR fields.
Broad altmetric analysis of Mendeley readerships through the ‘academic status’ of the readers of scientific publications
1. Broad altmetric analysis of Mendeley
readerships through the ‘academic
status’ of the readers of scientific
publications
Zohreh Zahedi, Rodrigo Costas & Paul Wouters
Centre for Science and Technology Studies (CWTS)
Leiden University, The Netherlands
STI Conference, 3-5 September 2014, Leiden University,
The Netherlands
2. Outline
• Introduction
• Objectives & Research Questions
• Data & Method
• Conclusions & Discussions
• Limitations
1
3. • Free online Reference management tool
• More than 2.8 million users
• Rich data source of readerships data for scholarly
outputs
• Usage statistics (discipline, academic status &
country)
• Open API
2
4. 3
Calculating
Academic status
37*41/100=15 librarian
17*41/100=7 PhD
10*41/100=4 researcher
15+7+4=26 (64%)
known
41-26=15 (36%)
unknown
5. Objectives & Research Questions:
• General distribution of Mendeley readerships over WoS
publications
Q. What is the distribution of Mendeley readerships
across fields and by different users?
• Relationship of Mendeley readerships with citations
indicators
Q. Are there any differences in correlation by the
different users and across 5 LR fields?
• To test the potential of identifying highly cited papers
by different user types in Mendeley
Q. To what extent can highly cited papers be
identified by the different types of users in Mendeley?
4
7. Data & Method
Data:
1,107,917 WOS publications (reviews and articles)
from all disciplines published in 2011 with DOI
available
Readerships from Mendeley REST API
Citations from CWTS in-house WoS database
Methods:
1. Correlation analysis
2. Precision-recall analysis
6
8. Table 1. General description of
publications with and without Mendeley
readerships
pubs %
Total Citation
Score (TCS)
CPP
Total
Readership Score
(TRS)
RPP
Journal
Citation Score
(JCS)
Pubs with some
unknown users
408,752 54.6 1,447,191 3.5 6,755,772 16.5
3
Pubs with only
known users
339,789 45.4 405,854 1.1 905,742 2.6 1.6
Total Pubs with
readerships
748,541 67.6 1,853,045 2.4 7,661,514 10.2 2.4
Total Pubs without
readerships
359,376 32.4 545,411 1.5 1.6
Total 1,107,917 100 2,398,456 2.16 7,661,514 6.9 2.1
7
9. Table 2. Distribution of Mendeley readerships across 5
major fields of science (Leiden Ranking 2013)
8
Main
fields
Life & earth
science
Biomedical &
health science
Natural &
engineering
Social science
& humanities
Mathematics
& computer
%
Publications
with only
known users
p 56.325 133.957 136.594 28.995 34.482
% 14.4% 34.3% 35% 7.4% 8.8% 100%
tcs 62.617 192.181 177.989 13.878 21.593
% 13.4% 41% 38% 3% 4.6% 100%
trs 159.911 35.393 372.183 82.148 82.034
% 15.2% 33.7% 35.4% 7.8% 7.8% 100%
cpp 1.1 1.4 1.3 0.5 0.6
rpp 2.8 2.6 2.7 2.8 2.4
10. Figure 1. Distribution of Mendeley readerships by
the different academic status across LR fields
(pubs with only known users)
9
60%
50%
40%
30%
20%
10%
0%
Biomedical &
health sciences
Life & earth
sciences
Mathematics &
computer science
Natural sciences &
engineering
Social sciences &
humanities
PhD
Students
PostDocs
Professors
Researchers
OtherProfessional
Lecturers
Librarians
11. Table 3. Correlation analysis of the of citation and
Mendeley readership by types of known users
10
Spearman's rho
Pubs with Known users
Citations Readerships
Correlation
Coefficient
.165
Sig. (2-tailed)
.000
N
390.353
Spearman
Correlation CitationsProfessors PhD Lecturer Student Researcher Librarian
Other
Professional
PostDoc
Citations 1 0.005 .107 -.007 .039 .036 -.012 .026 .073
Professors 1 -.116 -.037 -.150 -.102 -.039 -.060 -.074
PhD 1 -.052 -.058 -.082 -.070 -.118 -.029
Lecturer 1 -.050 -.038 -.011 -.016 -.038
Student 1 -.119 -.039 -.063 -.116
Researcher 1 -.025 -.016 -.051
Librarian 1 -.005 -.034
Other Prof. 1 -.050
PostDoc 1
Correlation is significant at the 0.01 level (2-tailed)
12. Table 4. Correlation analysis of the of citation and
Mendeley readership by types of users across 5
LR Fields (Known users)
Spearman Correlation
Citations and readerships
across 5 LR fields 2013
Librarian
Other
Prof. Post Doc professors PhD Lecturer student researchers
Biomedical & health sciences
(n=133,957)
-.019 .023 .060 .028 .088 -0.005 .009 .034
Life & earth sciences
(n=56,325)
-.006 .006 .039 -.003 .125 .012 .047 .014
Mathematics & computer science
(n=34,482)
-.008 -.017 .029 -.003 .109 .016 .043 .005
Natural sciences & engineering
(n=136,594)
-.012 -.003 .077 .022 .157 -.003 .064 .026
Social sciences & humanities
(n=28,995)
.013* .025 .043 -.015 .085 -.006 .040 033
11
Correlation is significant at the 0.01 level (2-tailed)
13. Precision-recall analysis
• Precision is defined as the number of highly cited
publications in the selection divided by the total
number of publications in the selection.
• Recall is defined as the number of highly cited
publications in the selection divided by the total
number of highly cited publications
(an approach developed by Waltman & Costas, 2013)
12
14. Figure 2. General Precision-recall curves for JCS (blue line) and
total readerships (green line) for identifying 1% most highly cited
publications (extended approach on the left and tight approach on
the right)
13
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Recall
Precision
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Recall
Precision
All publications in the sample
Publications with at least
one Mendeley readership
15. Figure 3. Precision-recall curves for JCS (blue line) and PhD
readerships (green line) for identifying 1% most highly cited
publications (extend approach on the left and tight approach on the
right)
14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Recall
Precision
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Recall
Precision
All publications in the sample Publications with at least one
Mendeley readership
16. Conclusions & Discussions (1)
• Mendeley readerships as a new source of ‘impact’
assessment compared to citations
• The publications with Mendeley readerships received
higher readership impact vs. citation impact per
publication than those without readerships
• This suggests a faster reception of Mendeley readerships
as compared to citations
• In terms of readership density across the 5 major LR
fields, on average, all fields show higher RPP scores than
CPP scores and some disciplinary differences among
fields have been observed
15
17. Conclusions & Discussions (2)
• Regarding the academic status, the most
common types of users in Mendeley are PhDs
and Students (besides the ‘unknown’ users)
and similar proportions are observed for all
the LR fields
• The correlation analysis shows relatively low
relationships among the users with different
academic status, thus introducing the idea that
potentially the different types of Mendeley
users could be reading different publications
and therefore these “academic status” could
help to detect different typologies of impact
16
18. Conclusions & Discussions (3)
• One of the remarkable results is that the overall
filtering capacity for detecting highly cited
publications by Mendeley readerships tends to
outperform (or at least is quite similar to) that of
the JCS indicator, something that has not been
observed for other altmetric sources (Costas et. al,
2014)
17
19. Limitations
• Data collection (time consuming, speed of the APIs,
no perfect data matching)
• Mendeley only reports the three most frequent users
of the publications and this uncertainty introduced
by the ‘unknown’ users limits the interpretation of
the results
• It is not clear whether the academic status of the
users are updated regularly or how to distinguish
users who could belong to more than one category
(e.g. a librarian who is also a PhD student)
• Category of users are not identical (PhD student,
Doctoral student, Graduate student, Post Doc) which
needs to be better defined
18
20. Final words:
Disclosure of unknown users can help much better
and transparent understanding of the different
users in Mendeley
Still more research needed…
19
21. Thanks for your attention!
For any questions please contact us at:
{z.zahedi.2,
rcostas,p.f.woutes}@cwts.leidenuniv.nl
20