SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Downloaden Sie, um offline zu lesen
THE OPEN RESEARCH CHALLENGE: PEER
REVIEW AND PUBLICATION OF RESEARCH DATA
Dr Jonathan Tedds jat26@le.ac.uk @jtedds
Senior Research Fellow,
D2K Data to Knowledge Research Group
(University of Leicester)
PI #PREPARDE http://www.le.ac.uk/projects/preparde
http://www.astrogrid.org
(April 2008 1st public release)
Science as an Open Enterprise Report
Why open?
• As a first step towards this intelligent
openness, data that underpin a journal article
should be made concurrently available in an
accessible database
• We are now on the brink of an achievable aim:
for all science literature to be online, for all of
the data to be online and for the two to be
interoperable. [p.7]
• Royal Society June 2012, Science as an Open
Enterprise,
http://royalsociety.org/policy/projects/science
-public-enterprise/report/
• Issues linking data to the scientific record:
– Data persistence
– Data and metadata quality
– Attribution and credit for data producers
• Geoffrey Boulton (Edinburgh), Lead author:
– “Science has been sleepwalking into crisis of
replicability...and of the credibility of science”
– “Publishing articles without making the data
available is scientific malpractice”
Data Reuse: asking new questions
Hubble Space Telescope
• Papers based upon reuse of archived observations now exceed those based on the
use described in the original proposal.
– http://archive.stsci.edu/hst/bibliography/pubstat.html
• See also work by Piwowar & Vision re life sciences: “Data reuse and the open data
citation advantage”
– http://peerj.com/preprints/1/
Oh, and…. says so :P
We are committed to openness in scientific research data to speed up the progress of
scientific discovery, create innovation, ensure that the results of scientific research are
as widely available as practical, enable transparency in science and engage the public
in the scientific process.
• To the greatest extent and with the fewest constraints possible publicly funded
scientific research data should be open, while at the same time respecting
concerns in relation to privacy, safety, security and commercial interests, whilst
acknowledging the legitimate concerns of private partners.
• Open scientific research data should be easily discoverable, accessible, assessable,
intelligible, useable, and wherever possible interoperable to specific quality
standards.
• To ensure successful adoption by scientific communities, open scientific research
data principles will need to be underpinned by an appropriate policy environment,
including recognition of researchers fulfilling these principles, and appropriate
digital infrastructure.
Scale of the problem:
who, what, when
where….?
http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data-
sharing-worms/
• Timothy Vines and colleagues studied reproducibility of data sets in zoology and changes
through time
– gathered 516 papers published between 1991 and 2011
– then they tried to track the data down…
• Even tracking down the authors was a challenge
– Over time a dwindling minority of papers were accompanied by author email addresses that still
functioned
• only 37% of the data - even from papers in 2011 - were still findable and retrievable
– proportion dropped each earlier year
• For papers published in 1991
– only 7% of the data could be determined to truly still be in existence and retrievable
– few authors could be found, and most of them were reporting that their data were lost or
inaccessible
This isn’t new…
Henry Oldenburg
– inveterate correspondent
– now think of as scientist
• Had idea to publish Philosophical
Transactions (1665):
o Should be written in vernacular not Latin
o Underlying evidence must be concurrently
published
o Helped propel Europe at the time
o Concept of scientific self correction
• able to write it's errors
• Wrote: “thought fit to employ the [printing]
press……..Universal Good of Mankind“
o How do we achieve these ends in the post-
Gutenburg era?
Science ecosystems (Peter Fox, Rensselaer)
• These elements are what enable scientists to
explore/ confirm/ deny their research ideas
and collaborate!
• Abduction as well as induction and deduction
Accountability
ProofExplanation Justification Verifiability
‘Transparency’ -> Translucency
Trust
IdentityCitability
Integrateability
Data as a “public good” (2011)
• Public good
• Preservation
• Discovery
• Confidentiality
• First use
• Recognition
• Public funding
http://osc.universityofcalifornia.edu/openaccesspolicy/
So what do we mean by publishing data?
• The familiar:
– Supplementary tables via journal or
– Archived raw or calibrated facility data
– Discipline specific and institutional / national archives
• Data under the graph?
– In order to reproduce and adapt article analysis
• “Research ready” open data
– In order to reuse and repurpose
– for interdisciplinary researchers, community, business
– Ideally peer reviewed?
Research data example - level 1:
• A typical example from physical
sciences (astronomy) distinguishes
between broad categories within
the research data spectrum:
• raw/initially auto-processed data
produced at a research facility such
as an observatory
• typically made publically
available in this format after an
embargo period of e.g. 1 year
• in some cases available
immediately - e.g. Swift
Gamma Ray Burst satellite
"research ready" processed data which has been fully calibrated,
combined and cleaned/annotated
• often produced by individuals or collaborations
• rarely available to anyone outside the collaboration except upon
request/collaboration
• needed for re-analysis or reuse for science unless you have detailed
sub domain specific knowledge and detailed contextual information to
reproduce from raw
• considered to enable a competitive advantage for producers
• may be produced by dedicated data scientists on behalf of the
community for major survey/missions e.g. ESA XMM-Survey Science
Centre (Leicester), NASA, NCAR…
Research data example – level 2
output dataset – following
detailed analysis of research
ready datasets
• forms the data under the
graph in a journal publication
following analysis of research
ready datasets
• Might be available as a Table
via journal, CDS etc
• May not be available outside
the collaboration except upon
request/collaboration
• may well generate future
additional samples and papers
for the owning collaboration
on top of the original
• other researchers may request
the data for their own research
but may not get it!
Research data example – level 3
….and STOP!
• Next project
– Proposal long since written
– Probably already underway…
• Feel free to email ME if you would like to work
on an idea using this dataset or code
– As long as I’m a co-author on the paper!
– You have to go through me to find out what you
really need to know to reuse the data/code
• published catalogue type representation of published output dataset
• NOT a “data paper”….but could be
• optional in many cases, mandatory for most major surveys
• usually made available via project specific online resource
• may be provided as table of parameters based on research
ready dataset, usually linked from and associated with a journal
• specifically produced in order for the wider community to reuse
(cite!) and repurpose if wanted
• The well-known Sloan Digital Sky Survey is a classic example or
more recently the 2XMMi X-ray catalogue I have a close
involvement with (largest X-ray survey of the sky).
Research data example – level 4a
http://adsabs.harvard.edu/abs/2013arXiv1302.5329E
e.g.
data paper describing and linking to output dataset(s)
Research data example – level 4b
Live Data
paper!
Dataset citation is
first thing in the
paper and is also
included in reference
list (to take
advantage of citation
count systems)
DOI: 10.1002/gdj3.2
Data Publications
• Initiatives for open publication of data,
datasets, data papers and open peer
review
• RDMF8 ‘Engaging with the publishers’:
http://www.dcc.ac.uk/events/research-
data-management-forum-rdmf/rdmf8-
engaging-publishers
• Particularly Rebecca Lawrence on peer
review policies
• Earth System Science Data:
http://www.earth-system-science-
data.net/
• Pensoft Data Publishing Policies and
Guidelines for Biodiversity Data
http://www.pensoft.net/news.php?n=59
Slide: Simon Hodson (Jisc / CODATA)
21
ODE Data Publication Pyramid:
Pubs
Supps
Data Archives
Data on Disks
and in Drawers
(1) Top of the
pyramid is stable
but small
(2) Risk that
supplements to
articles turn into
Data Dumping
places
(3) Too many
disciplines lack a
community
endorsed data
archive
(4) Estimates are
that at least 75 %
of research data
is never made
openly avaiable
21
From Mayernik et al (in prep) – PREPARDE project - Most cited
Bulletin of the American Meteorological Society (BAMS) articles.
Data from Web of Science, gathered on June 11, 2013
Article Data paper? Citations Article details
1 Yes 10,113 Kalnay, E; et al. The NCEP/NCAR 40-year reanalysis project,
1996.
2 No 3,201 Torrence, C; Compo, GP. A practical guide to wavelet
analysis, 1998.
3 No 2,367 Mantua, NJ; et al. A Pacific interdecadal climate oscillation
with impacts on salmon production, 1997.
4 Yes 1,987 Kistler, R; et al. The NCEP-NCAR 50-year reanalysis:
Monthly means CD-ROM and documentation, 2001.
5 Yes 1,791 Xie, PP; Arkin, PA. Global precipitation: A 17-year monthly
analysis based on gauge observations, satellite estimates, and
numerical model outputs, 1997.
6 Yes 1,448 Kanamitsu, M; et al. NCEP-DOE AMIP-II reanalysis (R-2),
2002.
7 No 1,014 Baldocchi, D; et al. FLUXNET: A new tool to study the
temporal and spatial variability of ecosystem-scale carbon
dioxide, water vapor, and energy flux densities, 2001.
8 Yes 902 Rossow, WB; Schiffer, RA. Advances in understanding clouds
from ISCCP, 1999.
9 Yes 900 Rossow, WB; Schiffer, RA. ISCCP cloud data products, 1991.
10 No 877 Hess, M; Koepke, P; Schult, I. Optical properties of aerosols
and clouds: The software package OPAC, 1998.
11 No 815 Willmott, CJ. Some comments on the evaluation of model
performance, 1982.
12 No 815 Trenberth, KE. The definition of El Nino, 1997.
13 Yes 785 Woodruff, SD; Slutz, RJ; et al. A comprehensive ocean-
atmosphere data set, 1987.
14 Yes 776 Meehl, G.A.; et al. The WCRP CMIP3 multimodel dataset - A
It’s a long road….
What do researchers need to make this all possible?
– Incentives - citations, promotion, support
• long way to go
– Institutional and funder policy framework
• mostly there now?
– Appropriate discipline specific community centres of expertise
• rare, mostly limited to big science niches or very broad but may be
poorly sustained
– Institutional support services for the basics
• pilots to date
– Software tools that are open and can be adapted
• on the way
PREPARDE: Peer REview for
Publication & Accreditation of Research
Data in the Earth sciences
Jonathan Tedds (Leicester), Sarah Callaghan (BADC), Fiona Murphy
(Wiley), Rebecca Lawrence (F1000R), Geraldine Stoneham (MRC),
Elizabeth Newbold (BL), Rachel Kotarski (BL), Matthew Mayernik
(NCAR), John Kunze, Carly Strasser (CDL), Angus Whyte (DCC), Becca
Wilson (Leicester), Simon Hodson (Jisc) and #PREPARDE project team
+ Geraldine Clement Stoneham (MRC), Elizabeth Newbold, Rachel
Kotarski (BL) on data peer review
http://www.le.ac.uk/projects/preparde
• Partnership formed between Royal
Meteorological Society and academic
publishers Wiley Blackwell to develop a
mechanism for the formal publication of data in
the Open Access Geoscience Data Journal
• GDJ publishes short data articles cross-linked
to, and citing, datasets that have been
deposited in approved data centres and
awarded DOIs (or other permanent identifier).
• A data article describes a dataset, giving details
of its collection, processing, software, file
formats, etc., without the requirement of novel
analyses or ground breaking conclusions.
• the when, how and why data was collected
and what the data-product is.
http://www.geosciencedata.com/
PREPARDE key use case: Geoscience Data Journal,
Wiley-Blackwell and the Royal Meteorological Society
• capture the processes and procedures required to publish a scientific dataset
– ingestion into a data repository
– formal publication in a data journal
• address key issues in data publication
– how to peer-review a dataset?
– what criteria are needed for a repository to be considered objectively
trustworthy?
– how can datasets and journal publications be effectively cross-linked for the
benefit of the wider research community?
• PREPARDE team includes key expertise in
– Research
– academic publishing
– data management
• Earth Sciences focus but produce general guidelines applicable to a wide range of
scientific disciplines and data publication types incl life sciences (F1000R)
PREPARDE: Peer REview for Publication & Accreditation of Research
Data in the Earth sciences http://www.le.ac.uk/projects/preparde
BADC
Data Data
BODC
DataData
A Journal
(Any online
journal system)
PDF PDF PDF PDF PDF
Word processing software
with journal template
Data Journal
(Geoscience Data Journal)
html html html html
1) Author prepares the
paper using word
processing software.
3) Reviewer reviews the
PDF file against the
journal’s acceptance
criteria.
2) Author submits
the paper as a
PDF/Word file.
Word processing software
with journal template
1) Author prepares the
data paper using word
processing software and
the dataset using
appropriate tools.
2a) Author submits
the data paper to
the journal.
3) Reviewer reviews
the data paper and
the dataset it points
to against the
journals acceptance
criteria.
The traditional online journal model
Overlay journal model for publishing data
2b) Author submits
the dataset to a
repository.
Data
How: to publish data in GDJ
Live Data paper!
Dataset citation is first thing in
the paper and is also included
in reference list (to take
advantage of citation count
systems)
DOI: 10.1002/gdj3.2
Data
Centre
Trust: Repository accreditation
• Link between data paper and dataset is crucial!
• How do data journal editors know a repository
is trustworthy?
• How can repositories prove they’re
trustworthy?
• What makes a repository trustworthy?
• Many things: mission, processes, expertise,
workflows, history, systems, documentation, …
• Assessing trustworthiness requires assessing
the entire repository workflow
• PREPARDE / IDCC13 Workshop – report out soon!
• Peer review of data is implicitly peer review of
repository
And what does
“trustworthy” mean,
when you get right
down to it?
DataCite Repository List
• working document
• initated via a collaboration
between the British Library,
BioMed Central and the Digital
Curation Centre
• aims to capture growing number
of repositories for research data
• provided for information purposes
only:
• DataCite provides no
endorsements of quality or
suitability of the repositories
• encourage community
participation in developing
this resource
http://www.datacite.org/repolist/
Dryad Data Repository
JDAP: Joint Data Archiving Policy
 Joint Data Archiving Policy: http://datadryad.org/jdap
 Joint declarations, Feb 2010, in American Naturalist, Evolution, the Journal of Evolutionary
Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology:
http://www.journals.uchicago.edu/doi/full/10.1086/650340
 This journal requires, as a condition for publication, that data supporting the results
in the paper should be archived in an appropriate public archive, such as GenBank,
TreeBASE, Dryad, or the Knowledge Network for Biocomplexity.
 Allows embargos of up to one year;
allows exceptions for, e.g., sensitive
information such as human subject data
or the location of endangered species.
 ‘Data that have an established standard
repository, such as DNA sequences,
should continue to be archived in the
appropriate repository, such as GenBank.
For more idiosyncratic data, the data
can be placed in a more flexible digital
data library such as the National Science
Foundation-sponsored Dryad archive at
http://datadryad.org.'
Slide: Simon Hodson (Dryad / Jisc / CODATA)
PREPARDE and Bi-Directional Data Linking
• already have a link from the GDJ data
article to the data repository via DOI
• GDJ can also pull the standard DOI
metadata attached to that DOI from the
DataCite metadata store
• need to figure out a way so GDJ can
inform the repository that their dataset
has been cited/published!
• At this time, we have a manual work-
around (i.e. email)
• Workshop on cross-linking between
data centres and publishers 30th April
2013 at RAL, UK
• Report out soon!
BADC NCAR
GDJ
Standardised
metadata
DataCite Metadata Store
Standardised
metadata
Peer review of data: the Perfect Disaster?
• Support for the peer review process
– scholars contributing peer reviews with little formal
reward
– opportunity to polish and refine understanding of the
cutting edge of research
• But peer review system under stress
– exploding number of journals, conferences, and grant
applications
– self-publication tools - blogs and wikis - allow scholars to
disseminate their research results and products
• Faster and more directly
• Now adding research data into the publication and peer
review queues …
Peer-review of data
• Technical
– author guidelines for GDJ
– Funder Data Value Checklist
– implicit peer review of repository?
• Scientific
– pre-publication?
– post-publication? E.g. F1000R
– guidelines on uncertainty e.g. IPCC
– discipline specific?
– EU Inspire spatial formatting
• Societal
– contribution to human knowledge
– reliability http://libguides.luc.edu/content.php?pid=5464&sid=164619
Open Peer Review of Data
ESSD peer review ensures that the datasets are:
Plausible with no immediately detectable problems;
Sufficient high quality and their limitations clearly
stated;
Well annotated by standard metadata and available
from a certified data center/repository;
Customary with regard to their format(s) and/or
access protocol, and expected to be useable for the
foreseeable future;
Openly accessible (toll free)
Earth System Science Data journal:
http://www.earth-system-science-data.net/
Rebecca Lawrence, Data Publishing: peer review,
shared standards and collaboration,
http://www.dcc.ac.uk/events/research-data-
management-forum-rdmf/rdmf8-engaging-
publishers
Faculty 1000 Open Peer Review
Sanity check:
Format and suitable basic structure adherence
A standard basic protocol structure is adhered to
Data stored in the most appropriate and stable
location
Open Peer Review:
Is the method used appropriate for the scientific
question being asked?
Has enough information been provided to be able to
replicate the experiment?
Have appropriate controls been conducted, and the
data presented?
Is the data in a useable format/structure?
Are stated data limitations and possible sources of
error appropriately described
Does the data ‘look’ ok (optional; e.g. Microarray
data)
Draft Recommendations on Peer-
review of data
• Summary Recommendations from
Workshop at British Library, 11 March
2013
• Workshop attendees included funders,
publishers, repository managers,
researchers ….
• Draft recommendations put up for
discussion and feedback captured
• Feedback from the community still
welcome
• 2nd workshop 24 June: put
recommendations to peer reviewers!
http://libguides.luc.edu/content.php?pid=5464&sid=164619
Document at: http://bit.ly/DataPRforComment
Feedback to: https://www.jiscmail.ac.uk/DATA-
PUBLICATION
Draft Recommendations on data peer review
Summary Recommendations from Workshop at the British Library, 11 March 2013
• Connecting data review with data management planning
• Connecting scientific, technical review and curation
• Connecting data review with article review
• 4-5 draft recommendations in each of above
• Assist Researchers, Publishers, Journal Editors, Reviewers,
Data Centres, Institutional Repositories to map requirements
for data peer review
• Matrix of stakeholders vs processes
– Assist in assigning responsibilities for given context
– New for most disciplines
– Learn from disciplines where this already happens
Connecting data review with data management planning
1. All research funders should at least require a “data sharing plan” as part of
all funding proposals, and if a submitted data sharing plan is inadequate,
appropriate amendments should be proposed.
2. Research organisations should manage research data according to
recognised standards, providing relevant assurance to funders so that
additional technical requirements do not need to be assessed as part of the
funding application peer review. (Additional note: Research organisations
need to provide adequate technical capacity to support the management of
the data that the researchers generate.)
3. Research organisations and funders should ensure that adequate funding is
available within an award to encourage good data management practice.
4. Data sharing plans should indicate how the data can and will be shared and
publishers should refuse to publish papers which do not clearly indicate how
underlying data can be accessed, where appropriate.
Connecting scientific, technical review and curation
1. Articles and their underlying data or metadata (by the same or other
authors) should be multi-directionally linked, with appropriate
management for data versioning.
2. Journal editors should check data repository ingest policies to avoid
duplication of effort , but provide further technical review of important
aspects of the data where needed. (Additional note: A map of
ingest/curation policies of the different repositories should be generated.)
3. If there is a practical/technical issue with data access (e.g. files don’t open
or exist), then the journal should inform the repository of the issue. If
there is a scientific issue with the data, then the journal should inform the
author in the first instance; if the author does not respond adequately to
serious issues, then the journal should inform the institution who should
take the appropriate action. Repositories should have a clear policy in
place to deal with any feedback.
Connecting data review with article review
1. For all articles where the underlying data is being submitted, authors need to
provide adequate methods and software/infrastructure information as part of
their article. Publishers of these articles should have a clear data peer review
process for authors and referees.
2. Publishers should provide simple and, where appropriate, discipline-specific data
review (technical and scientific) checklists as basic guidance for reviewers.
3. Authors should clearly state the location of the underlying data. Publishers should
provide a list of known trusted repositories or, if necessary, provide advice to
authors and reviewers of alternative suitable repositories for the storage of their
data.
4. For data peer review, the authors (and journal) should ensure that the data
underpinning the publication, and any tools required to view it, should be fully
accessible to the referee. The referees and the journal need to then ensure
appropriate access is in place following publication.
5. Repositories need to provide clear terms and conditions for access, and ensure
that datasets have permanent and unique identifiers.
Publishing research data
• Research is heavily context specific
=> So keep it context specific?
• Publishers and Professional & Learned Societies can help
galvanise agreement of researchers
• To define how they want their data represented and preserved for reuse &
citation => your researchers need you
• Along with funder appointed peer review committees often stronger
connection than to institution
• Institutional managers/services cannot cover wide range of discipline specific
expertise
• Like publishers don’t cover all fields
• Relatively constant as career progresses
• Change host institution
• Change field(s)
• Change technique
• In UK RCUK funding for APCs strictly for articles only….
– How to fund APCs for depositing data in repositories?
– Will publishers charge APCs to publish data papers?
2012-02-07
DCC roadshow East Midlands - CC-BY-SA
42
PDB
GenBank
UniProt
Pfam
Spreadsheets, Notebooks
Local, Lost
High throughput experimental methods
Industrial scale
Commons based production
Public data sets
Cherry picked results
Preserved
CATH, SCOP
(Protein Structure
Classification)
ChemSpider
Research and the long tail
Slide: Carole Goble
Enabling Open Data Publishing
• Active Data Management Planning
– built in at proposal stage
• Local institutional tweaks of funder and local templates
– Implemented and evolved in project
• Data Management Plan as a live, evolving object
• Annotate data on the fly – lab notebook approach
– Curated & preserved using permanent identifiers
• Appropriate repository and data collection descriptors
Active Data Storage: Identifying the Holy Grail ?
• “what is needed is a tool to transparently sync local and
network storage” (Marieke Guy, JISCMRD2 Nottingham)
• CKAN (Orbital, Lincoln)?
• Research Data Toolkit (Herts) – hybrid solutions
• UC3 suite at CDL…
• DropBox-like functionality a must
• Usability
• Technical interoperability
• aim to help create databases for research data so
• facilitates collaboration and data sharing
• enables the subsequent publication of datasets
• challenge is to ensure data are
• documented
• Preserved
• service is sustainable
http://halogen.le.ac.uk
 Portable Antiquities
Scheme (British Museum)
 Place-names
(Nottingham)
 Surnames
 Genetics
 IT hosting and GIS
 Best practice:
#JISCMRD, UKRDS,
DCC, international
Halogen as template for research data management #jiscmrd
 Requirements Analysis – must be iterative!
 Data Management Plan – use DMPonline (UK Digital Curation Centre)
 Scalable research data management infrastructure
 pilot phase to nationally available resource
 LAMP stack IT infrastructure: host research database – work with JISC/DCC
 A model for the long term delivery of a data management service within the
institution including
 support, maintenance, governance & charging policies
 Include researchers, IT services, research support office, library services etc.
BENEFITS
 New research opportunities
 Cross database work – seed new research samples
 Scholarly communication/access to national resources
 Key to English Place Names (Nottingham)
 Portable Antiquities Scheme (British Museum)
 Verification, re-purposing, re-use of data
 Cleaning & enhancing private research datasets for reuse & correlation
 No re-creation of data
 Increased transparency
 excellent training for best practice in research data management
 Increasing research productivity
 Build in cleaning, annotation, enhancement into normal research workflows
 research datasets may immediately be reusable and interoperable
 Impact & Knowledge Transfer
 Reuse IT infrastructure
 Increasing skills base of researchers/students/staff
Reward = Leverhulme Trust funding £1.3m!
CHALLENGES
 interdisciplinary research database
 ingest each input dataset in form such that sufficient information is carried forward
to enable interoperation
 Cultural differences
 versioning & provenance for input datasets
 which software tools, infrastructure , Query interface?
 suitable for multi disciplinary researchers
 Requirements upon the institution for sustaining the research assets & skills
 Requirements upon the researchers
 Annotating
 Refreshing
 Maintainence of datasets
No Response
63%
Response
Received
37%
Researcher Responses to Contacts Made
Suggested timeline for implementing
institutional research data management
From Whyte & Tedds (2011), DCC Briefing
http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
Challenge for institutions
– Rise to scientific and research challenge
• Not just a management challenge
• Responsibility for the knowledge they create
– Library
• “Doing the wrong things through the wrong people”?
• Challenge for library to enable:
• curation of data and publications
• active support from data scientists
• from centralised to dispersed support
• Expert centres such as D-Lab essential!
– IT Service
• Provide research data platforms for researchers:
– Active storage
– Enable collaboration
– Connect to preservation services through Library
But that’s not all…
 What about the software underpinning data driven
research?
 If we’re going to publish as open data:
 How do we help researchers to store, annotate and
discover the datasets they create?
 How do you sustain and reuse that?
Biomedical Research Infrastructure Software Service Kit
A vision for cloud-based open source research applications
#BRISSKit
http://www.brisskit.le.ac.uk
BRISSKit context:
The I4Health goal of applying knowledge engineering to close the
‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
www.brisskit.le.ac.uk
Email: brisskit@le.ac.uk
http://www.brisskit.le.ac.uk
The semantic bridge
?
OBiBa Onyx
Records participant
consent, questionnaire
data and primary
specimen IDs
i2b2
Cohort selection and
data querying
Bio-ontology!
Research Software Sustainability
• OS community engagement
• standards compliance
• consortium approach
• work with grain of researchers
• discipline specific forks?
• Github versioning an example for research data?
• OS Community Engagement Charter
• defining engagement with existing & new OS
communities
• including adoption & code commitments
See Rob Baxter blog: “The research software engineer”
• http://dirkgorissen.com/2012/09/13/the-research-
software-engineer/
Lessons for institutions?
 Can’t do it all in house!
 But many disciplines don’t have data centres
 Build coalition of institutional actors
 Essential to have high level support
 Take and shape
 Identify what you do have in-house
 Access external tools, standards where possible
 Active storage, collaboration, eprints…
 Propose best of breed for (inter)national reuse
 Share benefits (and costs) over acacdemic networks
 Sustainability the key challenge
 As much cultural as technical – needs networks…
But institutions alone aren’t enough –
we need an alliance!
Accepted Research Data Alliance Interest Group
Publishing Data
• http://rd-alliance.org/
• Close coordination with ICSU-WDS working group, CODATA and other
ongoing initiatives in data publication
– WDS under International Council of Science, RDA wider
– Avoid duplication within related RDA and WDS WGs – join up
– For WDS partnerships between publishers and data centres key
• scope the territory – gap analysis
• Use RDA Forum and new http://jiscmail.ac.uk/data-publication 350+ list
• Take findings from RDA / WDS group(s) and trial in other communities /
disciplines / institutional repositories
15-9-2013 Launch meeting discussion 67
“Keep reaching for the stars”
• increase the trustworthiness
and value of individual data
sets
• strengthen the findings based
on cited data sets
• increase the transparency and
traceability of data and
publications
• enable reuse and repurposing
i.e. Problems but extraordinary
opportunities – all hands on deck!
Thank you for listening and thanks to CDL, D-
Lab and the project partners 
Dr Jonathan Tedds jat26@le.ac.uk @jtedds
Senior Research Fellow,
D2K Data to Knowledge Research Group
(University of Leicester)
#PREPARDE http://www.le.ac.uk/projects/preparde
Mailing list: http://jiscmail.ac.uk/DATA-PUBLICATION

Weitere ähnliche Inhalte

Was ist angesagt?

Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
The Chemist's Toolkit 10 9 09
The Chemist's Toolkit 10 9 09The Chemist's Toolkit 10 9 09
The Chemist's Toolkit 10 9 09Elizabeth Brown
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Open Science : Democratizing Access to Science
Open Science : Democratizing Access to ScienceOpen Science : Democratizing Access to Science
Open Science : Democratizing Access to ScienceOkba Bekhelifi
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesMartin Donnelly
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
 
The world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openThe world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openheila1
 
5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 Elizabeth Brown
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open ScienceKaitlin Thaney
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 

Was ist angesagt? (20)

Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
The Chemist's Toolkit 10 9 09
The Chemist's Toolkit 10 9 09The Chemist's Toolkit 10 9 09
The Chemist's Toolkit 10 9 09
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Open Science : Democratizing Access to Science
Open Science : Democratizing Access to ScienceOpen Science : Democratizing Access to Science
Open Science : Democratizing Access to Science
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
The world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openThe world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or open
 
5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011 5 steps to using open access in the classroom 11 9 2011
5 steps to using open access in the classroom 11 9 2011
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 

Ähnlich wie Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data"

How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
Open science: your questions answered
Open science: your questions answeredOpen science: your questions answered
Open science: your questions answeredVarsha Khodiyar
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgmandatascienceiqss
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Martin Donnelly
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science researchBeth Plale
 
The Horizon 2020 Open Data Pilot
The Horizon 2020 Open Data PilotThe Horizon 2020 Open Data Pilot
The Horizon 2020 Open Data PilotMartin Donnelly
 
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
The Horizon2020 Open Data Pilot - OpenAIRE WebinarThe Horizon2020 Open Data Pilot - OpenAIRE Webinar
The Horizon2020 Open Data Pilot - OpenAIRE WebinarMartin Donnelly
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureBjörn Brembs
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6ARDC
 

Ähnlich wie Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data" (20)

How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Open science: your questions answered
Open science: your questions answeredOpen science: your questions answered
Open science: your questions answered
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science research
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLANINCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
 
The Horizon 2020 Open Data Pilot
The Horizon 2020 Open Data PilotThe Horizon 2020 Open Data Pilot
The Horizon 2020 Open Data Pilot
 
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
The Horizon2020 Open Data Pilot - OpenAIRE WebinarThe Horizon2020 Open Data Pilot - OpenAIRE Webinar
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
 
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
 
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 

Kürzlich hochgeladen

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 

Kürzlich hochgeladen (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data"

  • 1. THE OPEN RESEARCH CHALLENGE: PEER REVIEW AND PUBLICATION OF RESEARCH DATA Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, D2K Data to Knowledge Research Group (University of Leicester) PI #PREPARDE http://www.le.ac.uk/projects/preparde
  • 2.
  • 4. Science as an Open Enterprise Report Why open? • As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database • We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7] • Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science -public-enterprise/report/ • Issues linking data to the scientific record: – Data persistence – Data and metadata quality – Attribution and credit for data producers • Geoffrey Boulton (Edinburgh), Lead author: – “Science has been sleepwalking into crisis of replicability...and of the credibility of science” – “Publishing articles without making the data available is scientific malpractice”
  • 5. Data Reuse: asking new questions Hubble Space Telescope • Papers based upon reuse of archived observations now exceed those based on the use described in the original proposal. – http://archive.stsci.edu/hst/bibliography/pubstat.html • See also work by Piwowar & Vision re life sciences: “Data reuse and the open data citation advantage” – http://peerj.com/preprints/1/
  • 6. Oh, and…. says so :P We are committed to openness in scientific research data to speed up the progress of scientific discovery, create innovation, ensure that the results of scientific research are as widely available as practical, enable transparency in science and engage the public in the scientific process. • To the greatest extent and with the fewest constraints possible publicly funded scientific research data should be open, while at the same time respecting concerns in relation to privacy, safety, security and commercial interests, whilst acknowledging the legitimate concerns of private partners. • Open scientific research data should be easily discoverable, accessible, assessable, intelligible, useable, and wherever possible interoperable to specific quality standards. • To ensure successful adoption by scientific communities, open scientific research data principles will need to be underpinned by an appropriate policy environment, including recognition of researchers fulfilling these principles, and appropriate digital infrastructure.
  • 7. Scale of the problem: who, what, when where….? http://blogs.scientificamerican.com/absolutely-maybe/2013/09/10/opening-a-can-of-data- sharing-worms/ • Timothy Vines and colleagues studied reproducibility of data sets in zoology and changes through time – gathered 516 papers published between 1991 and 2011 – then they tried to track the data down… • Even tracking down the authors was a challenge – Over time a dwindling minority of papers were accompanied by author email addresses that still functioned • only 37% of the data - even from papers in 2011 - were still findable and retrievable – proportion dropped each earlier year • For papers published in 1991 – only 7% of the data could be determined to truly still be in existence and retrievable – few authors could be found, and most of them were reporting that their data were lost or inaccessible
  • 8. This isn’t new… Henry Oldenburg – inveterate correspondent – now think of as scientist • Had idea to publish Philosophical Transactions (1665): o Should be written in vernacular not Latin o Underlying evidence must be concurrently published o Helped propel Europe at the time o Concept of scientific self correction • able to write it's errors • Wrote: “thought fit to employ the [printing] press……..Universal Good of Mankind“ o How do we achieve these ends in the post- Gutenburg era?
  • 9. Science ecosystems (Peter Fox, Rensselaer) • These elements are what enable scientists to explore/ confirm/ deny their research ideas and collaborate! • Abduction as well as induction and deduction Accountability ProofExplanation Justification Verifiability ‘Transparency’ -> Translucency Trust IdentityCitability Integrateability
  • 10. Data as a “public good” (2011) • Public good • Preservation • Discovery • Confidentiality • First use • Recognition • Public funding
  • 12. So what do we mean by publishing data? • The familiar: – Supplementary tables via journal or – Archived raw or calibrated facility data – Discipline specific and institutional / national archives • Data under the graph? – In order to reproduce and adapt article analysis • “Research ready” open data – In order to reuse and repurpose – for interdisciplinary researchers, community, business – Ideally peer reviewed?
  • 13. Research data example - level 1: • A typical example from physical sciences (astronomy) distinguishes between broad categories within the research data spectrum: • raw/initially auto-processed data produced at a research facility such as an observatory • typically made publically available in this format after an embargo period of e.g. 1 year • in some cases available immediately - e.g. Swift Gamma Ray Burst satellite
  • 14. "research ready" processed data which has been fully calibrated, combined and cleaned/annotated • often produced by individuals or collaborations • rarely available to anyone outside the collaboration except upon request/collaboration • needed for re-analysis or reuse for science unless you have detailed sub domain specific knowledge and detailed contextual information to reproduce from raw • considered to enable a competitive advantage for producers • may be produced by dedicated data scientists on behalf of the community for major survey/missions e.g. ESA XMM-Survey Science Centre (Leicester), NASA, NCAR… Research data example – level 2
  • 15. output dataset – following detailed analysis of research ready datasets • forms the data under the graph in a journal publication following analysis of research ready datasets • Might be available as a Table via journal, CDS etc • May not be available outside the collaboration except upon request/collaboration • may well generate future additional samples and papers for the owning collaboration on top of the original • other researchers may request the data for their own research but may not get it! Research data example – level 3
  • 16. ….and STOP! • Next project – Proposal long since written – Probably already underway… • Feel free to email ME if you would like to work on an idea using this dataset or code – As long as I’m a co-author on the paper! – You have to go through me to find out what you really need to know to reuse the data/code
  • 17. • published catalogue type representation of published output dataset • NOT a “data paper”….but could be • optional in many cases, mandatory for most major surveys • usually made available via project specific online resource • may be provided as table of parameters based on research ready dataset, usually linked from and associated with a journal • specifically produced in order for the wider community to reuse (cite!) and repurpose if wanted • The well-known Sloan Digital Sky Survey is a classic example or more recently the 2XMMi X-ray catalogue I have a close involvement with (largest X-ray survey of the sky). Research data example – level 4a
  • 19. data paper describing and linking to output dataset(s) Research data example – level 4b Live Data paper! Dataset citation is first thing in the paper and is also included in reference list (to take advantage of citation count systems) DOI: 10.1002/gdj3.2
  • 20. Data Publications • Initiatives for open publication of data, datasets, data papers and open peer review • RDMF8 ‘Engaging with the publishers’: http://www.dcc.ac.uk/events/research- data-management-forum-rdmf/rdmf8- engaging-publishers • Particularly Rebecca Lawrence on peer review policies • Earth System Science Data: http://www.earth-system-science- data.net/ • Pensoft Data Publishing Policies and Guidelines for Biodiversity Data http://www.pensoft.net/news.php?n=59 Slide: Simon Hodson (Jisc / CODATA)
  • 21. 21 ODE Data Publication Pyramid: Pubs Supps Data Archives Data on Disks and in Drawers (1) Top of the pyramid is stable but small (2) Risk that supplements to articles turn into Data Dumping places (3) Too many disciplines lack a community endorsed data archive (4) Estimates are that at least 75 % of research data is never made openly avaiable 21
  • 22. From Mayernik et al (in prep) – PREPARDE project - Most cited Bulletin of the American Meteorological Society (BAMS) articles. Data from Web of Science, gathered on June 11, 2013 Article Data paper? Citations Article details 1 Yes 10,113 Kalnay, E; et al. The NCEP/NCAR 40-year reanalysis project, 1996. 2 No 3,201 Torrence, C; Compo, GP. A practical guide to wavelet analysis, 1998. 3 No 2,367 Mantua, NJ; et al. A Pacific interdecadal climate oscillation with impacts on salmon production, 1997. 4 Yes 1,987 Kistler, R; et al. The NCEP-NCAR 50-year reanalysis: Monthly means CD-ROM and documentation, 2001. 5 Yes 1,791 Xie, PP; Arkin, PA. Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs, 1997. 6 Yes 1,448 Kanamitsu, M; et al. NCEP-DOE AMIP-II reanalysis (R-2), 2002. 7 No 1,014 Baldocchi, D; et al. FLUXNET: A new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities, 2001. 8 Yes 902 Rossow, WB; Schiffer, RA. Advances in understanding clouds from ISCCP, 1999. 9 Yes 900 Rossow, WB; Schiffer, RA. ISCCP cloud data products, 1991. 10 No 877 Hess, M; Koepke, P; Schult, I. Optical properties of aerosols and clouds: The software package OPAC, 1998. 11 No 815 Willmott, CJ. Some comments on the evaluation of model performance, 1982. 12 No 815 Trenberth, KE. The definition of El Nino, 1997. 13 Yes 785 Woodruff, SD; Slutz, RJ; et al. A comprehensive ocean- atmosphere data set, 1987. 14 Yes 776 Meehl, G.A.; et al. The WCRP CMIP3 multimodel dataset - A
  • 23. It’s a long road…. What do researchers need to make this all possible? – Incentives - citations, promotion, support • long way to go – Institutional and funder policy framework • mostly there now? – Appropriate discipline specific community centres of expertise • rare, mostly limited to big science niches or very broad but may be poorly sustained – Institutional support services for the basics • pilots to date – Software tools that are open and can be adapted • on the way
  • 24. PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences Jonathan Tedds (Leicester), Sarah Callaghan (BADC), Fiona Murphy (Wiley), Rebecca Lawrence (F1000R), Geraldine Stoneham (MRC), Elizabeth Newbold (BL), Rachel Kotarski (BL), Matthew Mayernik (NCAR), John Kunze, Carly Strasser (CDL), Angus Whyte (DCC), Becca Wilson (Leicester), Simon Hodson (Jisc) and #PREPARDE project team + Geraldine Clement Stoneham (MRC), Elizabeth Newbold, Rachel Kotarski (BL) on data peer review http://www.le.ac.uk/projects/preparde
  • 25. • Partnership formed between Royal Meteorological Society and academic publishers Wiley Blackwell to develop a mechanism for the formal publication of data in the Open Access Geoscience Data Journal • GDJ publishes short data articles cross-linked to, and citing, datasets that have been deposited in approved data centres and awarded DOIs (or other permanent identifier). • A data article describes a dataset, giving details of its collection, processing, software, file formats, etc., without the requirement of novel analyses or ground breaking conclusions. • the when, how and why data was collected and what the data-product is. http://www.geosciencedata.com/ PREPARDE key use case: Geoscience Data Journal, Wiley-Blackwell and the Royal Meteorological Society
  • 26. • capture the processes and procedures required to publish a scientific dataset – ingestion into a data repository – formal publication in a data journal • address key issues in data publication – how to peer-review a dataset? – what criteria are needed for a repository to be considered objectively trustworthy? – how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community? • PREPARDE team includes key expertise in – Research – academic publishing – data management • Earth Sciences focus but produce general guidelines applicable to a wide range of scientific disciplines and data publication types incl life sciences (F1000R) PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences http://www.le.ac.uk/projects/preparde
  • 27. BADC Data Data BODC DataData A Journal (Any online journal system) PDF PDF PDF PDF PDF Word processing software with journal template Data Journal (Geoscience Data Journal) html html html html 1) Author prepares the paper using word processing software. 3) Reviewer reviews the PDF file against the journal’s acceptance criteria. 2) Author submits the paper as a PDF/Word file. Word processing software with journal template 1) Author prepares the data paper using word processing software and the dataset using appropriate tools. 2a) Author submits the data paper to the journal. 3) Reviewer reviews the data paper and the dataset it points to against the journals acceptance criteria. The traditional online journal model Overlay journal model for publishing data 2b) Author submits the dataset to a repository. Data How: to publish data in GDJ
  • 28. Live Data paper! Dataset citation is first thing in the paper and is also included in reference list (to take advantage of citation count systems) DOI: 10.1002/gdj3.2
  • 29. Data Centre Trust: Repository accreditation • Link between data paper and dataset is crucial! • How do data journal editors know a repository is trustworthy? • How can repositories prove they’re trustworthy? • What makes a repository trustworthy? • Many things: mission, processes, expertise, workflows, history, systems, documentation, … • Assessing trustworthiness requires assessing the entire repository workflow • PREPARDE / IDCC13 Workshop – report out soon! • Peer review of data is implicitly peer review of repository And what does “trustworthy” mean, when you get right down to it?
  • 30. DataCite Repository List • working document • initated via a collaboration between the British Library, BioMed Central and the Digital Curation Centre • aims to capture growing number of repositories for research data • provided for information purposes only: • DataCite provides no endorsements of quality or suitability of the repositories • encourage community participation in developing this resource http://www.datacite.org/repolist/
  • 31. Dryad Data Repository JDAP: Joint Data Archiving Policy  Joint Data Archiving Policy: http://datadryad.org/jdap  Joint declarations, Feb 2010, in American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology: http://www.journals.uchicago.edu/doi/full/10.1086/650340  This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity.  Allows embargos of up to one year; allows exceptions for, e.g., sensitive information such as human subject data or the location of endangered species.  ‘Data that have an established standard repository, such as DNA sequences, should continue to be archived in the appropriate repository, such as GenBank. For more idiosyncratic data, the data can be placed in a more flexible digital data library such as the National Science Foundation-sponsored Dryad archive at http://datadryad.org.' Slide: Simon Hodson (Dryad / Jisc / CODATA)
  • 32. PREPARDE and Bi-Directional Data Linking • already have a link from the GDJ data article to the data repository via DOI • GDJ can also pull the standard DOI metadata attached to that DOI from the DataCite metadata store • need to figure out a way so GDJ can inform the repository that their dataset has been cited/published! • At this time, we have a manual work- around (i.e. email) • Workshop on cross-linking between data centres and publishers 30th April 2013 at RAL, UK • Report out soon! BADC NCAR GDJ Standardised metadata DataCite Metadata Store Standardised metadata
  • 33. Peer review of data: the Perfect Disaster? • Support for the peer review process – scholars contributing peer reviews with little formal reward – opportunity to polish and refine understanding of the cutting edge of research • But peer review system under stress – exploding number of journals, conferences, and grant applications – self-publication tools - blogs and wikis - allow scholars to disseminate their research results and products • Faster and more directly • Now adding research data into the publication and peer review queues …
  • 34. Peer-review of data • Technical – author guidelines for GDJ – Funder Data Value Checklist – implicit peer review of repository? • Scientific – pre-publication? – post-publication? E.g. F1000R – guidelines on uncertainty e.g. IPCC – discipline specific? – EU Inspire spatial formatting • Societal – contribution to human knowledge – reliability http://libguides.luc.edu/content.php?pid=5464&sid=164619
  • 35. Open Peer Review of Data ESSD peer review ensures that the datasets are: Plausible with no immediately detectable problems; Sufficient high quality and their limitations clearly stated; Well annotated by standard metadata and available from a certified data center/repository; Customary with regard to their format(s) and/or access protocol, and expected to be useable for the foreseeable future; Openly accessible (toll free) Earth System Science Data journal: http://www.earth-system-science-data.net/ Rebecca Lawrence, Data Publishing: peer review, shared standards and collaboration, http://www.dcc.ac.uk/events/research-data- management-forum-rdmf/rdmf8-engaging- publishers Faculty 1000 Open Peer Review Sanity check: Format and suitable basic structure adherence A standard basic protocol structure is adhered to Data stored in the most appropriate and stable location Open Peer Review: Is the method used appropriate for the scientific question being asked? Has enough information been provided to be able to replicate the experiment? Have appropriate controls been conducted, and the data presented? Is the data in a useable format/structure? Are stated data limitations and possible sources of error appropriately described Does the data ‘look’ ok (optional; e.g. Microarray data)
  • 36. Draft Recommendations on Peer- review of data • Summary Recommendations from Workshop at British Library, 11 March 2013 • Workshop attendees included funders, publishers, repository managers, researchers …. • Draft recommendations put up for discussion and feedback captured • Feedback from the community still welcome • 2nd workshop 24 June: put recommendations to peer reviewers! http://libguides.luc.edu/content.php?pid=5464&sid=164619 Document at: http://bit.ly/DataPRforComment Feedback to: https://www.jiscmail.ac.uk/DATA- PUBLICATION
  • 37. Draft Recommendations on data peer review Summary Recommendations from Workshop at the British Library, 11 March 2013 • Connecting data review with data management planning • Connecting scientific, technical review and curation • Connecting data review with article review • 4-5 draft recommendations in each of above • Assist Researchers, Publishers, Journal Editors, Reviewers, Data Centres, Institutional Repositories to map requirements for data peer review • Matrix of stakeholders vs processes – Assist in assigning responsibilities for given context – New for most disciplines – Learn from disciplines where this already happens
  • 38. Connecting data review with data management planning 1. All research funders should at least require a “data sharing plan” as part of all funding proposals, and if a submitted data sharing plan is inadequate, appropriate amendments should be proposed. 2. Research organisations should manage research data according to recognised standards, providing relevant assurance to funders so that additional technical requirements do not need to be assessed as part of the funding application peer review. (Additional note: Research organisations need to provide adequate technical capacity to support the management of the data that the researchers generate.) 3. Research organisations and funders should ensure that adequate funding is available within an award to encourage good data management practice. 4. Data sharing plans should indicate how the data can and will be shared and publishers should refuse to publish papers which do not clearly indicate how underlying data can be accessed, where appropriate.
  • 39. Connecting scientific, technical review and curation 1. Articles and their underlying data or metadata (by the same or other authors) should be multi-directionally linked, with appropriate management for data versioning. 2. Journal editors should check data repository ingest policies to avoid duplication of effort , but provide further technical review of important aspects of the data where needed. (Additional note: A map of ingest/curation policies of the different repositories should be generated.) 3. If there is a practical/technical issue with data access (e.g. files don’t open or exist), then the journal should inform the repository of the issue. If there is a scientific issue with the data, then the journal should inform the author in the first instance; if the author does not respond adequately to serious issues, then the journal should inform the institution who should take the appropriate action. Repositories should have a clear policy in place to deal with any feedback.
  • 40. Connecting data review with article review 1. For all articles where the underlying data is being submitted, authors need to provide adequate methods and software/infrastructure information as part of their article. Publishers of these articles should have a clear data peer review process for authors and referees. 2. Publishers should provide simple and, where appropriate, discipline-specific data review (technical and scientific) checklists as basic guidance for reviewers. 3. Authors should clearly state the location of the underlying data. Publishers should provide a list of known trusted repositories or, if necessary, provide advice to authors and reviewers of alternative suitable repositories for the storage of their data. 4. For data peer review, the authors (and journal) should ensure that the data underpinning the publication, and any tools required to view it, should be fully accessible to the referee. The referees and the journal need to then ensure appropriate access is in place following publication. 5. Repositories need to provide clear terms and conditions for access, and ensure that datasets have permanent and unique identifiers.
  • 41. Publishing research data • Research is heavily context specific => So keep it context specific? • Publishers and Professional & Learned Societies can help galvanise agreement of researchers • To define how they want their data represented and preserved for reuse & citation => your researchers need you • Along with funder appointed peer review committees often stronger connection than to institution • Institutional managers/services cannot cover wide range of discipline specific expertise • Like publishers don’t cover all fields • Relatively constant as career progresses • Change host institution • Change field(s) • Change technique • In UK RCUK funding for APCs strictly for articles only…. – How to fund APCs for depositing data in repositories? – Will publishers charge APCs to publish data papers?
  • 42. 2012-02-07 DCC roadshow East Midlands - CC-BY-SA 42 PDB GenBank UniProt Pfam Spreadsheets, Notebooks Local, Lost High throughput experimental methods Industrial scale Commons based production Public data sets Cherry picked results Preserved CATH, SCOP (Protein Structure Classification) ChemSpider Research and the long tail Slide: Carole Goble
  • 43. Enabling Open Data Publishing • Active Data Management Planning – built in at proposal stage • Local institutional tweaks of funder and local templates – Implemented and evolved in project • Data Management Plan as a live, evolving object • Annotate data on the fly – lab notebook approach – Curated & preserved using permanent identifiers • Appropriate repository and data collection descriptors
  • 44. Active Data Storage: Identifying the Holy Grail ? • “what is needed is a tool to transparently sync local and network storage” (Marieke Guy, JISCMRD2 Nottingham) • CKAN (Orbital, Lincoln)? • Research Data Toolkit (Herts) – hybrid solutions • UC3 suite at CDL… • DropBox-like functionality a must • Usability • Technical interoperability • aim to help create databases for research data so • facilitates collaboration and data sharing • enables the subsequent publication of datasets • challenge is to ensure data are • documented • Preserved • service is sustainable
  • 45. http://halogen.le.ac.uk  Portable Antiquities Scheme (British Museum)  Place-names (Nottingham)  Surnames  Genetics  IT hosting and GIS  Best practice: #JISCMRD, UKRDS, DCC, international
  • 46. Halogen as template for research data management #jiscmrd  Requirements Analysis – must be iterative!  Data Management Plan – use DMPonline (UK Digital Curation Centre)  Scalable research data management infrastructure  pilot phase to nationally available resource  LAMP stack IT infrastructure: host research database – work with JISC/DCC  A model for the long term delivery of a data management service within the institution including  support, maintenance, governance & charging policies  Include researchers, IT services, research support office, library services etc.
  • 47.
  • 48. BENEFITS  New research opportunities  Cross database work – seed new research samples  Scholarly communication/access to national resources  Key to English Place Names (Nottingham)  Portable Antiquities Scheme (British Museum)  Verification, re-purposing, re-use of data  Cleaning & enhancing private research datasets for reuse & correlation  No re-creation of data  Increased transparency  excellent training for best practice in research data management  Increasing research productivity  Build in cleaning, annotation, enhancement into normal research workflows  research datasets may immediately be reusable and interoperable  Impact & Knowledge Transfer  Reuse IT infrastructure  Increasing skills base of researchers/students/staff
  • 49. Reward = Leverhulme Trust funding £1.3m!
  • 50. CHALLENGES  interdisciplinary research database  ingest each input dataset in form such that sufficient information is carried forward to enable interoperation  Cultural differences  versioning & provenance for input datasets  which software tools, infrastructure , Query interface?  suitable for multi disciplinary researchers  Requirements upon the institution for sustaining the research assets & skills  Requirements upon the researchers  Annotating  Refreshing  Maintainence of datasets
  • 51.
  • 53.
  • 54.
  • 55. Suggested timeline for implementing institutional research data management From Whyte & Tedds (2011), DCC Briefing http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
  • 56. Challenge for institutions – Rise to scientific and research challenge • Not just a management challenge • Responsibility for the knowledge they create – Library • “Doing the wrong things through the wrong people”? • Challenge for library to enable: • curation of data and publications • active support from data scientists • from centralised to dispersed support • Expert centres such as D-Lab essential! – IT Service • Provide research data platforms for researchers: – Active storage – Enable collaboration – Connect to preservation services through Library
  • 57. But that’s not all…  What about the software underpinning data driven research?  If we’re going to publish as open data:  How do we help researchers to store, annotate and discover the datasets they create?  How do you sustain and reuse that?
  • 58. Biomedical Research Infrastructure Software Service Kit A vision for cloud-based open source research applications #BRISSKit http://www.brisskit.le.ac.uk
  • 59. BRISSKit context: The I4Health goal of applying knowledge engineering to close the ‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
  • 62. The semantic bridge ? OBiBa Onyx Records participant consent, questionnaire data and primary specimen IDs i2b2 Cohort selection and data querying Bio-ontology!
  • 63. Research Software Sustainability • OS community engagement • standards compliance • consortium approach • work with grain of researchers • discipline specific forks? • Github versioning an example for research data? • OS Community Engagement Charter • defining engagement with existing & new OS communities • including adoption & code commitments See Rob Baxter blog: “The research software engineer” • http://dirkgorissen.com/2012/09/13/the-research- software-engineer/
  • 64. Lessons for institutions?  Can’t do it all in house!  But many disciplines don’t have data centres  Build coalition of institutional actors  Essential to have high level support  Take and shape  Identify what you do have in-house  Access external tools, standards where possible  Active storage, collaboration, eprints…  Propose best of breed for (inter)national reuse  Share benefits (and costs) over acacdemic networks  Sustainability the key challenge  As much cultural as technical – needs networks…
  • 65. But institutions alone aren’t enough – we need an alliance!
  • 66. Accepted Research Data Alliance Interest Group Publishing Data • http://rd-alliance.org/ • Close coordination with ICSU-WDS working group, CODATA and other ongoing initiatives in data publication – WDS under International Council of Science, RDA wider – Avoid duplication within related RDA and WDS WGs – join up – For WDS partnerships between publishers and data centres key • scope the territory – gap analysis • Use RDA Forum and new http://jiscmail.ac.uk/data-publication 350+ list • Take findings from RDA / WDS group(s) and trial in other communities / disciplines / institutional repositories
  • 67. 15-9-2013 Launch meeting discussion 67
  • 68. “Keep reaching for the stars” • increase the trustworthiness and value of individual data sets • strengthen the findings based on cited data sets • increase the transparency and traceability of data and publications • enable reuse and repurposing i.e. Problems but extraordinary opportunities – all hands on deck!
  • 69. Thank you for listening and thanks to CDL, D- Lab and the project partners  Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, D2K Data to Knowledge Research Group (University of Leicester) #PREPARDE http://www.le.ac.uk/projects/preparde Mailing list: http://jiscmail.ac.uk/DATA-PUBLICATION