Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Citing data in research articles:
principles, implementation, challenges
- and the benefits of changing our ways
Jo McEnty...
Life Science Data
Familiar Complexity!
Article‘Package’ExternalResources
“Recognized” data repos:
file|structured record,
Accession|DOI|API+...
Europe PMC literature database
Europe PMC
• Abstracts: 30 million
• Full-text articles: 3 million
• Article citation count...
About EMBL-EBI
• Part of the European
Molecular Biology
Laboratory
• International, non-profit
research institute
• Europe...
Making data discoverable
Labs around the
world deposit
data and we…
Archive it
Classify it
Share it with
other data
provid...
Journal Data Publishing
Data Citation in Europe PMC full text
Literature*
Added-Value
Submitted
*OMIM, Clinical trials, GO
Submission statements
v...
Data Citation Principals Engender Two
Big Ideas
"sound, reproducible scholarship rests upon a
foundation of robust, access...
1 Importance
2 Credit and Attribution
3 Evidence
4 Unique Identification
5 Access
6 Persistence
7 Specificity and Verifiab...
Joint Declaration
Data should be considered legitimate, citable
products of research. Data citations should be
accorded th...
Data citations should facilitate giving scholarly credit
and normative and legal attribution to all contributors
to the da...
In scholarly literature, whenever and wherever
a claim relies upon data, the corresponding data
should be cited.
3. Eviden...
A data citation should include a persistent method
for identification that is machine actionable, globally
unique, and wid...
Data citations should facilitate access to the data
themselves and to such associated metadata,
documentation, code, and o...
Unique identifiers, and metadata describing the
data, and its disposition, should persist -- even
beyond the lifespan of t...
Data citations should facilitate identification of,
access to, and verification of the specific data that
support a claim....
Data citation methods should be sufficiently flexible
to accommodate the variant practices among
communities, but should n...
Many organizational endorsements
An implementation example
Principle 2:
Credit and
Attribution
Principle 4, 5,
6:
Unique ID
Access
Persistence
Principle 7:...
http://europepmc.org/articles/PMC3089613
Large dataset:
http://europepmc.org/articles/PMC3535838
http://europepmc.org/articles/PMC3766260
http://europepmc.org/articles/PMC3704603
http://europepmc.org/articles/PMC3710810
Fig. 2
!! 2469 references !!
http://europepmc.org/articles/PMC2672098
Examples of Implementations of Data Citations
in Reference Lists
http://europepmc.org/articles/PMC3661987
<mixed-citation publication-type="other">
Occurrence in reference list:
Occurrenc...
http://europepmc.org/articles/PMC3646594
<mixed-citation publication-type="thesis">
Occurrence in text:
Occurrence in refe...
http://europepmc.org/articles/PMC3722494
<mixed-citation publication-type="webpage">
Also in this reference list: a non-DO...
http://europepmc.org/articles/PMC3626513
<mixed-citation publication-type="journal">
Occurrence in text:
Occurrence in ref...
JATS support for data citation
<mixed-citation publication-type='data'>
<name><surname>Heinz</surname><given-names>D.W.</g...
Minimal, maximal & extensible citation
Resource
name
I
D
Resource
name
Resolution ‘template’ I
D
Author
list
Resource
name...
Integrated Research
Reused from: seier+seier,
Flickr
Reused from: Images
Money, Flickr
Articles
Data
People
Institutions
F...
A data citation should include a persistent method
for identification that is machine actionable, globally
unique, and wid...
1. Discoverability through accessibility
• Deposit in a public/open database
• Where possible, structured archive (e.g. PD...
2. Discoverability through structured data
structured data is one of the true
enablers of life science
- Discovery of homo...
Structured data is good value for money
Annual cost of generating new protein
structure data in labs around the world
Annu...
Degrees of Data
Unstructured/semi-
structured
Structured
Added Value
Metadata
A picture of a graph
A spreadsheet of my res...
Metadata – critical to discoverability
Generic: title, submitters, date, file format, version.
citation
basic search
Wagne...
BioStudyEBI
BioStudy database for unstructured data
Study
Publications
Ontologies
Data files
Other DBs
Metadata
Other DBs
Elixir: An international distributed infrastructure
for
• Data
• Standards
• Tools
• Compute
• Training
• Industry
THE END
Nächste SlideShare
Wird geladen in …5
×

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

3.254 Aufrufe

Veröffentlicht am

Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.

Veröffentlicht in: Wissenschaft
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

  1. 1. Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways Jo McEntyre Europe PMC, EMBL-EBI www.ebi.ac.uk
  2. 2. Life Science Data
  3. 3. Familiar Complexity! Article‘Package’ExternalResources “Recognized” data repos: file|structured record, Accession|DOI|API+ Accession Institutional repos: file|structured record, URL|DOI|API+Accession Author database|‘website’: file|struct record, URL|DOI|API+Accession Supp info tables/data: file, URL|DOI Cross-reference Dataset list Ref to external resRef to external res Reference list Fig Source data: file, URL|DOI Fig (caption + graphic) Cross-reference Ref to external resource Adapted from Thomas Lemberger, EMBO
  4. 4. Europe PMC literature database Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs • Semantic annotation • Data citations • Data integration Europe PMC is a member of the PMC International Collaboration. Funded by 28 European funders of life science research
  5. 5. About EMBL-EBI • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research
  6. 6. Making data discoverable Labs around the world deposit data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise
  7. 7. Journal Data Publishing
  8. 8. Data Citation in Europe PMC full text Literature* Added-Value Submitted *OMIM, Clinical trials, GO Submission statements vs reuse? 260K
  9. 9. Data Citation Principals Engender Two Big Ideas "sound, reproducible scholarship rests upon a foundation of robust, accessible data" "data should be considered legitimate, citable products of research" These slides are adapted from: http://www.slideshare.net/joanstarr/data-citation-a-joint-declaration-
  10. 10. 1 Importance 2 Credit and Attribution 3 Evidence 4 Unique Identification 5 Access 6 Persistence 7 Specificity and Verifiability 8 Interoperability and flexibility Full Principles: https://www.force11.org/datacitation Joint Declaration on Data Citation Principles
  11. 11. Joint Declaration Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 1. Importance
  12. 12. Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 2. Credit and Attribution Joint Declaration
  13. 13. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. 3. Evidence Joint Declaration
  14. 14. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. !!! Joint Declaration
  15. 15. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 5. Access Joint Declaration
  16. 16. Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe. 6. Persistence Joint Declaration
  17. 17. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited. 7. Specificity and Verifiability Joint Declaration
  18. 18. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities. 8. Interoperability and flexibility Joint Declaration
  19. 19. Many organizational endorsements
  20. 20. An implementation example Principle 2: Credit and Attribution Principle 4, 5, 6: Unique ID Access Persistence Principle 7: Specificity and Verifiability Principle 8: Interoperability and flexibility Creators, Year, Dataset Title, DOI, Data Repository, version (Resolves to landing page with access to metadata, docs, and data) Slide from Mercè Crosas, Ph.D. Harvard University
  21. 21. http://europepmc.org/articles/PMC3089613 Large dataset:
  22. 22. http://europepmc.org/articles/PMC3535838
  23. 23. http://europepmc.org/articles/PMC3766260
  24. 24. http://europepmc.org/articles/PMC3704603
  25. 25. http://europepmc.org/articles/PMC3710810 Fig. 2
  26. 26. !! 2469 references !! http://europepmc.org/articles/PMC2672098
  27. 27. Examples of Implementations of Data Citations in Reference Lists
  28. 28. http://europepmc.org/articles/PMC3661987 <mixed-citation publication-type="other"> Occurrence in reference list: Occurrence in text: Tagged in reference list as:
  29. 29. http://europepmc.org/articles/PMC3646594 <mixed-citation publication-type="thesis"> Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  30. 30. http://europepmc.org/articles/PMC3722494 <mixed-citation publication-type="webpage"> Also in this reference list: a non-DOI data citation Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  31. 31. http://europepmc.org/articles/PMC3626513 <mixed-citation publication-type="journal"> Occurrence in text: Occurrence in reference list: Tagged in reference list as: Cite data generated in the course of the work described?
  32. 32. JATS support for data citation <mixed-citation publication-type='data'> <name><surname>Heinz</surname><given-names>D.W.</given- names></name>, <name><surname>Baase</surname><given-names>W.A.</given- names></name>, <etal>et. al.</etal> <data-title>How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme</data-title>. <source>PDB Europe</source>, accession <pub-id pub-id-type='accession' assigning- authority='pdb' xlink:href='http://www.ebi.ac.uk/pdbe/entry/search/index?te xt:102L'>102l</pub-id>. <pub-id pub-id-type='doi' xlink:href='http://dx.doi.org/10.2210/pdb102l/pdb'>10.2210/ pdb102l/pdb</pub-id> </mixed-citation>
  33. 33. Minimal, maximal & extensible citation Resource name I D Resource name Resolution ‘template’ I D Author list Resource name Resolution ‘template’ I D Tim e ? Author list Resource name Resolution ‘template’ I D Tim e ? For example: new data vs pre-existing data For example: version Thomas Lemberger, EMBO
  34. 34. Integrated Research Reused from: seier+seier, Flickr Reused from: Images Money, Flickr Articles Data People Institutions Funders
  35. 35. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. Joint Declaration
  36. 36. 1. Discoverability through accessibility • Deposit in a public/open database • Where possible, structured archive (e.g. PDB, ENA) >> unstructured archive (e.g. Zenodo, Figshare) • Uniquely identify it: PID, Accession number, DOI, ROI • Give it context: metadata (and more) • All of the above = citable =
  37. 37. 2. Discoverability through structured data structured data is one of the true enablers of life science - Discovery of homology between genes across species - Predicting function based on protein folds • Structured data can be cross-analysed, compared by algorithm, and encourages development of new products and tools
  38. 38. Structured data is good value for money Annual cost of generating new protein structure data in labs around the world Annual cost of maintaining it in a central database
  39. 39. Degrees of Data Unstructured/semi- structured Structured Added Value Metadata A picture of a graph A spreadsheet of my results A record in a DNA sequence database A graphical display of a genome A narrative with citations, pictures and attachments Article
  40. 40. Metadata – critical to discoverability Generic: title, submitters, date, file format, version. citation basic search Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1 gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006 (Rel. 89, Last updated, Version 7). BN000065 Specific: organism, tissue, assay, page number … deep search analysis computation
  41. 41. BioStudyEBI BioStudy database for unstructured data Study Publications Ontologies Data files Other DBs Metadata Other DBs
  42. 42. Elixir: An international distributed infrastructure for • Data • Standards • Tools • Compute • Training • Industry
  43. 43. THE END

×