Research Data Explored: Citations versus Altmetrics
1. www.tugraz.at n
W I S S E N n T E C H N I K n L E I D E N S C H A F T
u www.tugraz.at
Research Data Explored:
Citations versus Altmetrics
Isabella Peters (ZBW), Peter Kraker (Know-Center), Elisabeth Lex (TU
Graz), Christian Gumpenberger (Uni Wien), Juan Gorraiz (Uni Wien)
32. Austrian Librarian Day, Sept 17th 2015, Vienna
2. www.tugraz.at n
Motivation
• Data citations have gained momentum
• Citations: Publish or Perish
• Altmetrics: social-media based metrics
• Societal impact of research data
Our Goal: Investigate research data with respect
bibliometric characteristics - citations as well as
altmetrics
2
3. www.tugraz.at n
Our study – Dataset
• Thomson Reuters Data Citation Index (DCI)
• high-quality research data from various
repositories
• Enables search, exploration and bibliometric
analysis of research data through Web of
Science
• We did a basic analysis for all items published in
DCI between 1960 and 2014
• Plus: altmetrics collected from three big altmetrics
data providers: ImpactStory, Altmetric.com, PlumX
3
4. www.tugraz.at n
Research Questions
1. How often are research data cited? Which and how
many of these have a DOI? From which repositories
do research data originate?
2. What are the characteristics of the most cited
research data? Which data types and disciplines are
the most cited? How does citedness evolve over
time?
3. To what extent are cited research data visible on
various altmetrics channels? Are there any
differences between the tools used for altmetrics
scores aggregation?
4
5. www.tugraz.at n
ImpactStory
• Targeted at individual researcher
• Works with individually assigned permanent
identifiers (e.g. DOIs, URLs, PubMed IDs) or links to
ORCID, Figshare, Publons, Slideshare, or Github to
auto-import new research outputs like e.g. papers,
data sets, slides
• Features altmetric scores (Twitter, Facebook,
Mendeley, Figshare, Google+, and Wikipedia
mentions)
5
6. www.tugraz.at n
Altmetric.com
• Targeted towards institutions and organizations
• Provides an altmetrics score + underlying data
• Search within variety of social media-platforms (e.g.,
Twitter, Facebook, Google+, blogs) for keywords and
for permanent identifiers
• E.g. DOIs, arXiv IDs, PubMed IDs
6
7. www.tugraz.at n
PlumX
• Article-level metrics for “artifacts”
• articles, audios, videos, book chapters, trials
• Works with ORCID and other user IDs (e.g., from
YouTube, Slideshare) as well as with DOIs, ISBNs,
PubMed-IDs, patent numbers, and URLs
• Statistics on usage of articles and artifacts
• e.g., views to or downloads of html pages or pdfs),
Mendeley readers, GitHub forks, Facebook
comments, YouTube subscribers.
7
8. www.tugraz.at n
Methodology
• DCI to retrieve records of cited research data
• Items published in the last decades (1960-9, 1970-9,
1980-9, 1990-9, 2000-9, 2010-4)
• Metadata fields: DOI/URL, doc type, source, research
area, publication year, data type, #citations, ORCID
• Citedness investigated for each decade
• Distribution of document types, data types, sources,
research area
• with >=2 citation (Sample 1, n=10,934 records )
• with >= 2 citations and at least 1 altmetric score
(Sample 2, n= 301)
8
10. www.tugraz.at n
Results for Sample 1
10
Citedness comparatively higher for research data published more recently
! interest in younger research data and increase in social media activity
11. www.tugraz.at n
Citation Distribution for Sample 1
11
• Almost half of the data
studies have a DOI
(48.9%) but only few data
sets
• Data studies on average
more cited than data sets
• Data studies with DOI
more citations than with
URL
• Only few repositories
(51), but most citations
12. www.tugraz.at n
Citation Distribution for Sample 1
12
Half of the research data (4,974 items; 45.5%) à only 2 citations
6 items (2 repos and 4 data studies): > 1000 citations
13. www.tugraz.at n
Citation Distribution for Sample 1
• Differences between most cited data types when
considering research data with a DOI or with a URL
13
14. www.tugraz.at n
Citation Distribution for Sample 1
• More common to refer to data studies via DOIs in
Social Sciences than in Natural and Life Sciences
14
Disciplinary differences: DOIs vs URLs, document types
15. www.tugraz.at n
Results for Sample 2
15
• Total of altmetrics
scores < than
number of citations
for all document
types with or
without DOI
• Mean altmetrics
score higher for
data studies than
for data sets
20. www.tugraz.at n
Details on Altmetrics Analysis in Plum X
20
• DOIs for data sets
seem to be important
in order to get
captures (Mendeley)
• URL sufficient for
inclusion in social
media (e.g.
Facebook, Twitter)
21. www.tugraz.at n
More Altmetrics Results...
• Top 10 research
data-DOIs with >=2
citations and with at
least 1entry in PlumX
• Cited research data
attracts more
citations than
altmetrics scores
• No correlation
between highly cited
and highly scored
research data.
21
22. www.tugraz.at n
Conclusions
• Low percentage of altmetrics scores for research
data with two or more citations
• Research data not so often published/shared?
• Reliability of altmetrics aggregation tools?
• We didn‘t observe a correlation between citation and
altmetrics scores
• Neither most cited research data nor most cited
sources (repositories) received highest scores in
PlumX
• Interestingly, although “figshare” accounts for almost
25% of the DCI, no item from “figshare” was cited at
least twice in DCI à see our follow-up work
presented at STI 2015!22
23. www.tugraz.at n
Conclusions
• Growing trend in citing research data since 2008 –
bias towards more recent research data à in general,
Research data mostly uncited
• Availability of cited research data with a DOI rather
low in DCI, but increasing
• Data studies with a DOI attract more citations than
those with a URL
• DOI in cited research data has so far been more
embraced in the Social Sciences than in the Natural
Sciences
• DOI/identifiers important to increase altmetrics scores
as well as aggregators rely on it
23
24. www.tugraz.at n
Future Work
• Investigate data citations in more detail
• Different from „paper citations“
• E.g. we found that entire repositories are
proportionally more often cited than single data
sets
• Meaning of data citations
• Influence of structure of underlying data
• Data curation, identifiers,..
24