Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
1. | 1
Anita de Waard, VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
December 19, 2016
Elsevier‘s RDM Program:
Ten Habits of Highly Effective Data
5. | 5
https://data.mendeley.com/
Linked to published
papers – or not
Linked to Github
– or not
Versioning and
provenance tracking
Store, Access: Mendeley Data
Different Licenses:
GNU-PL, CC-BY CC0,
etc
6. | 6
Access, Cite: Data Linking
• Integrated in paper submission process
• Supplementary data is never behind a firewall
• Closely integrated with > 150 databases:
7. | 7
Access, Discover: Scholix/DLIs
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe
PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
10. | 10
Data
articles
Software
articles
Method
articles
Protocols
Video
articles
Hardware
articles
Lab
resources
Full Research
paper
• Brief article types designed to
communicate a specific element of
the research cycle
• Complementary to full research
papers
• Easy to prepare and submit
• Peer-reviewed and indexed
• Receive a DOI and fully citable
• Allow citable post-publication
updates
• Primarily Open Access (CC-BY)
• Published in Multidisciplinary and
domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Review: Research Elements
11. | 11
• Cortex Registered Reports:
• Method and proposed analysis are submitted for pre-registration
• Paper is conditionally accepted
• Research is executed
• Full paper submitted, accepted provided that protocol is followed
• Reproducibility Papers:
• Describes all the software and data used to derive the published results, as
well as provides instructions on how to reproduce and validate such results.
• Using Mendeley Data, authors also submit their code, data, and optionally a
ReproZip package or a Docker container to make the review process easier.
• Reviewers not only review the reproducibility paper, but also validate the
results and claims published in the original manuscript.
• Once the paper is accepted, (non-blind) reviewers also become co-authors
and are encouraged to add a section in the paper that states the extent to
which the software is portable, is robust to changes, and is likely to be usable.
Reproduce: Some Journal Efforts:
12. | 12
Research
article
published
Initial inquiry
Share,
publish and
link data
Monitor
progress and
provide
guidance
Generate
reports
111110 00011
1101110 0000
001
10011
1
011100
101
What?
• Service for Research Institutes (esp. librarians) to
engage with researchers throughout the research
data life cycle.
How?
Offer service for Librarians to interact with researchers
regarding the RDM Process to:
• Offer solutions to store, share, link and publish data
• Monitor progress report on posting, citation,
downloads of dataset
• Provide monthly reportingDATA
LIGHTHOUSE
Metrics for Institutions: Data Lighthouse
13. | 13
10.Integrateupstreamanddownstream
–makemetadatatoserveuse.
Save
Share
Use
9. Re-usable
8. Reproducible
7. Trusted
6. Comprehensible
5. Citable
4. Discoverable
3. Accessible
1. Stored
2. Preserved
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
Data at Risk
Reproducibility Initiative
Data
Lighthouse
In summary:
Elsevier Efforts Collaborative Efforts
14. | 14
“Now show me how all of this works
together… on one of my papers!”
• Phil Bourne, August 2016
See Demo
15. | 15
A Tale of (Ir)reproducibility
There once was a computational biology paper…
Kinney et al. 2010, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000976
16. | 16
A Tale of (Ir)eproducibility
... that couldn’t be (easily) reproduced.
17. | 17
A Tale of (Ir)eproducibility
Some brave souls did reproduce it …
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne,
Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the
Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
18. | 18
A Tale of (Ir)eproducibility
… but it was a lot of work.
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne,
Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the
Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
19. | 19
Some tools to improve this:
1. Store protocols in an Electronic Lab Notebook.
Keep collection
of protocols
online
Edit, export,
share
20. | 20
Some tools to improve this:
2. Run experiments from this Lab Notebook.
Edit, export,
share
Base on saved
Protocols
Save and
Export Outputs
21. | 21
Some tools to improve this:
3. Export results to a trusted data repository.
Describe how
exoeriment can
be reproduced
Keep track of
versions of
dataset
Create DOI for
Citation
Link back to
protocols
Store up to 5
GB of data in
many formats
22. | 22
Some tools to improve this:
4. Publish in a data journal & link back.
Journal focuses
on Method
reporiduction
Link to protocols
Link to Data
Fully OA
23. | 23
The Moral of this Story:
• How are we improving the ‘old way of working’?
- Methods and data can be stored by researchers directly during the
experiment, so the 270 hours of reproduction > 0 (given that the protocol
is stored for reuse during the experiment)
- Better reproducibility because tools and methods are stored innately, no
need to recap, rebuild, and recover
- More accurate workflow representation because progress is tracked
while it happens, not just afterwards
• Are we there yet?
- We’re getting somewhere: “Your tools […], layer a UI on top of a whole
set of disjointed components; this is ultimately what people want!”
Phil Bourne, ADDS NIH
- But we’re not quite there:
o Need to run code from the tools, planned
o Even easier exporting/publishing workflows planned
o Integration with other tools: ELNs, (institutional) repositories, journals, sharing
platforms planned.
24. | 24
A development partnership proposal:
1. You try out our tools:
- Institutional install for Hivebench
- Installation of Mendeley Data (in the cloud, later on local service)
- If interested: Data Lighthouse pilot
2. In return, you help us explore what these tools will look like:
- Connect to Pis/Postdocs/Grad Students who are interested in trying out
Hivebench/Mendeley Data
- We ask them for feedback on the tools, help with any issues
- You explore Data Lighthouse, tell us what you would like to see in terms of
reporting/emails etc.
3. Timeframe:
- Start by signing an MoU (no money changes hands; we provide
services/software/support, you help connect us to researchers, provide
feedback)
- We evaluate collaboration after 6 months, see if anything needs to change
- Tools are free for 24 months, no other obligations.
25. | 25
Hivebench Features:
Fully-fledged electronic online notebook.
Allows researchers to manage:
• Experiments,
• Protocols,
• Reagents,
• Research Data (integrated with Mendeley Data, or not).
Collaborative and confidential:
• Researchers can keep results private, or collaborate with group, or world to publish
protocols
• Secure location in the cloud
Institutional edition (planned):
• Hivebench installed locally, on institutional server in secure offline environment
• Log-in with institutional credentials
• Tracking and reporting of metrics at group/individual level
26. | 26
Mendeley Data Features (today and tomorrow)
Trusted Data Repository
• Publish data under embargo: full control of visibility of datasets before and after publication
• Once published, DOI is assigned
• Published datasets stored (and accessible) in perpetuity in the DANS archive
• Data Seal of Approval certification
Flexible and Easy to Use
• Simple and intuitive user interface (a la Drop Box, Google Docs)
• Version management for longitudinal studies: new DOI for each version, enable version citation
• Customised metadata schemas for each research project
• Upload data directly from university file systems, other electronic lab notebooks, Dropbox etc.
• Automatic tagging of datasets with keywords using Elsevier Fingerprint Engine
Integrated into Research Ecosystem
• Integrated with Mendeley reference manager and social network used by over 3 million researchers
• Integrated with Github, versioning can be updated with software version
• Integrated with Hivebench ELN for end to end research lifecycle management
• Integrated with Elsevier publishing platform (Evise) used by over 1,000 scientific journals
• Link datasets with other research outputs (articles, datasets, software etc.) to increase findability and re-
use
• Files can be stored in the cloud
27. | 27
Mendeley Data Institutional Features (mostly tomorrow)
Customized for Institutions:
• Seamless integration with Pure to link research data to people, departments, publications and
projects
• Customised workflows that fit the way each research project team works and the rules of your
institution
• Files can be stored on institutional network file system
• Provide DOI minting using institutional prefix
• Showcase research datasets externally on a web page with institutional branding
• Provide single sign-on for researchers using existing institutional credentials
Reporting and Analysis Tools:
• Reporting on impact of datasets including views, downloads and citations
• Reporting on compliance with funder data mandates by Grant ID
• Reporting on storage space used by person, project and department to ensure operation
within assigned quotas
28. | 28
Data Lighthouse pilot, some questions:
General Research Data Management questions:
1. How does RDM work in your institution?
2. What role do libraries, research office, researchers play, respectively?
3. Do you have the institutional data policy?
4. Which departments are the higher/lower adopters?
5. What are the RDM tools available for your researchers? How well are they used?
6. Are you aware of negative/positive factors that may influence adoption rates?
Engagement questions:
1. How do you currently engage with researchers in the RDM space?
2. What additional services do you need?
3. Does the Data Lighthouse project resonate with your needs?
4. Are there any use cases/scenarios and metrics that we haven’t thought of?
5. Can we work together to improve adoption rates of RDM tools by your researchers?
6. Where would information re RDM processes come from, what format should it have?
Pilot questions: would you be interested in e.g.:
1. Organizing a joint workshop between Research Data Management key personnel of your
institution and the Elsevier RDM team to refine the current Data Lighthouse project scope and
requirements?
2. Running a test emailing campaign within 1-2 departments/labs followed by phone interviews with
a few librarians and active researchers?
29. | 29
Support for Research Data Management
with Data Lighthouse (mockups)
Datasets
shared
Datasets
linked
Datasets
curated
Data articles
submitted
Data articles
published
Datasets
viewed
Datasets
cited
Data Lighthouse
Dashboard
Data Lighthouse Dashboard
30. | 30
Links:
• RDM Projects:
• https://www.hivebench.com
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-
award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://rd-alliance.org/bof-data-search.html
• https://data.mendeley.com/
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
• https://www.elsevier.com/about/open-science/research-data
• Bourne Demo: Original Materials:
- The original research paper: Kinnings et al, 2010
- The paper describing the earlier reproducibility effort: Garijo et al., 2013
- A wiki with the reproduction attempt: Gil/Darijo, 2012
- Background materials on the reproduction efforts: Garijo, 2012
- SMAP Tool: Xie, 2010
- Protocol in Hivebench: https://www.hivebench.com/protocols/16483
- Experiment in Hivebench: https://www.hivebench.com/notebooks/8524/experiments/20562
- Data in Mendeley Data: https://data.mendeley.com/datasets/r69mvkckmn/draft?preview=1
- MethodsX Paper, with links to protocols and data:
http://www.articleofthefuture.com/methodsx.html
Hinweis der Redaktion
IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.
IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.
Chemistry data are retrievable from NIST, but only by going to their page in a browser and using their search tools. What about accessible within other applications, or accessible in assistive devices for those with vision impairment? What guarantee do we have the data will remain accessible in case of government funding problems?