Contributor identification is a core challenge in data publication. As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn makes it difficult to accurately attribute datasets published via online digital repositories to their creators – one of several key requirements for including these important outputs in the scholarly record.
In the GEN2PHEN project (http://www.gen2phen.org) we are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The core aim is that of ensuring that data creators are recognized and rewarded for publishing data. This work builds on and integrates with recently launched international initiatives to i) extend and adapt the existing DOI infrastructure for identifying, locating and citing online datasets (DataCite: http://www.datacite.org), and to ii) create a global registry of unique identifiers for authors and other contributors (ORCID: http://www.orcid.org).
The technical approach we are exploring in this pilot project utilizes this emerging global data citation and contributor identification framework, in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences.
This document is published under the CC-BY license (http://creativecommons.org/licenses/by/3.0/). This means that you can copy, redistribute and adapt the content, as long as you attribute the original work.
Presentation on how to chat with PDF using ChatGPT code interpreter
Poster presented at ISMB2011 in Vienna July 18
1. Gudmundur A. Thorisson, Owen Lancaster and Anthony J. Brookes
Department of Genetics, University of Leicester, Leicester, UK
As in scholarly communication more generally, non-unique person names and the current lack of a global
!!!"#$%&'($%")*#
identification infrastructure for producers of scholarly content makes it difficult to establish the identity of
Box 2: Identifying contributors
authors and other contributors. This in turn creates challenges in attributing credit for contributions to
science, as well as in tracking use/reuse and assessing impact of research outputs. Challenges
We are developing a series of novel web-based systems and processes for online dissemination of genetic Approx. 2/3 of ~6M authors in PubMed share a last name + first initial with at least one author. This
variation and other research data. The technical approach we are exploring utilizes emerging frameworks name ambiguity create difficulties in identifying and attributing creators of published works, including
for data identification and citation (Box 1) and for contributor identification (Box 2), in order to allow datasets published via online digital repositories. Solving the contributor identification challenge is key
published datasets to be discovered, cited in a scholarly context and unambiguously attributed. The core to including these important outputs in the scholarly record.
aim is that of ensuring that data creators are recognized and rewarded for publishing their data. We
argue that, along with other measures, such an incentive-based approach is key to motivating the
sharing of data and other types of digital research outputs in the life sciences.
Emerging solutions
With contributions from GEN2PHEN, the international ORCID initiative (http://www.orcid.org) is
creating a global infrastructure to "support the creation of a permanent, clear and unambiguous record
Box 1: Identifying digital research outputs of scholarly communication".
ORCID will enable identification of contributors via unique IDs and reliably linking them with their
published works, including but not limited to:
Challenges
- Peer-reviewed publications (CrossRef DOIs)
Current methods for monitoring data use/reuse and assessing impact relies on various referencing
standards and conventions. Tracking reuse is difficult, time-consuming and inaccurate, not the least - Datasets (DataCite DOIs)
due to difficulties in identifying the datasets in question. - Publications in the 'grey' literature
The new infrastructure will help solve many current
Existing and emerging solutions identification-related problems and create new
opportuntities, such as:
Assigning digital identifiers (IDs) to published works allows them to be reliably identified and cited.
Discovery:
In order to fulfill the requirements of the scholarly record, IDs should be persistent, globally unique
and citable. Together with unique IDs for contributors (Box 2), this forms the basis of unambiguous - Which other papers were published by co-authors of this paper?
attribution. - Which datasets were made available by this research project?
Evaluation:
Digital Object Identifiers (DOIs) are widely used for identifying and citing STM publications, via the - What is the scholarly record of this job applicant?
not-for-profit CrossRef publishers' association (http://www.crossref.org).
- How often were the paper we published cited in the last 2 years?
DOIs for scientific datasets issued via DataCite (http://datacite.org) are increasingly used for scientific
- What is the total no. citations and other references to papers, datasets and other outputs of the
data published in online digital repositories.
project we funded?
Pilot project: Cafe Variome - facilitating exchange of genetic variation data and attributing data creators
Diagnostic Central End-users (e.g.
laboratories ‘clearinghouse’ LSDB curators)
Publish data Retrieve Atom feeds
Submi&ng
muta,ons
from
diagnos,c
labs
using
“Café
Data
are
shared
with
diverse
3rd
par,es
via
manual
Variome
enabled”
so:ware
via
simple
bu>on
click retrieval
or
automated
feed-‐based
monitoring/retrieval
Unique identifier for contributor in ORCID IRISC2011 - Identity in Research Infrastructure
and Scientific Communication
The 2-day IRISC2011 international workshop will be held September 12-13
Data citation: G. A. Thorisson (ORCID:35-883-3523) and O. Lancaster in Helsinki, Finland. This event will bring together key stakeholders and
(ORCID:35-992-3523). 4x variants in BRCA2 gene. Published online via Cafe
experts and help foster collaboration, coordination and awareness in this
Variome. 21 January (2011) doi:10.1255/cafevariome.BRCA2-2352354
area, not only in biomedicine and bioinformatics but in all areas of scientific
G. A. Thorisson, Univ. Leicester research.
gthorisson@gmail.com
Agenda and other info at http://irisc-workshop.org
ORCID:35-883-3523
Unique DOI name for dataset in DataCite, located at:
http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354 For further information please contact gt50@le.ac.uk
!"#$%&"#'()'*+,-.-'/0''12.'"+345.6,'7488+,(109)'
:.;.,12'<368.=43>'%34?3688.'@<%AB$CCAD$CEFG'+,-.3'?36,1'
6?3..8.,1'$CCAHIJ
===J?.,$52.,J43?