SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Open Research Data:
Licensing | Standards | Future
Ross Mounce (@RMounce)
Natural History Museum, London
British Ecological Society
Open Data & Reproducibility
Workshop, London, 2015-04-21
XKCD 1179 on ISO 8601
bit.ly/opendataintro
These slides are a re-spin of my longer
OpenCon 2014 deck, on Slideshare here:
All my textual content is licensed under the
Creative Commons Attribution License 4.0 (CC BY), unless otherwise indicated
Outline
●
What is open data?
●
A short history of data sharing
●
Supplementary data needs to die
●
FAIR data as a 1st
class research output
By sharing data we can see further
Data (& code) are the building
blocks of science
Shared, re-used data allow us to
more rigorously test hypotheses;
“to see further”
...and to do it all more quickly and
easily.
What exactly is open data?
From http://opendefinition.org/,
see http://opendefinition.org/od/ for more detail
Open means anyone can
freely access, use, modify,
and share for any purpose
(subject, at most, to requirements
that preserve provenance and
openness)
Legally, what is open data?
There are many open knowledge definition (OKD) conformant licences,
including (but not limited to):
See here for the comprehensive list: http://opendefinition.org/licenses/
CC0 waiver
http://creativecommons.org/
publicdomain/zero/1.0/
CC BY (Attribution only)
https://creativecommons.org/
licenses/by/4.0/
CC BY-SA (Attribution-ShareAlike)
https://creativecommons.org/licenses/by
-sa/4.0/
CC0 should be the default for data
CC0 is the default for data at:
Hrynaszkiewicz & Cockerill (2012) BMC Research Notes
Open by default: a proposed copyright license and waiver agreement for open access
research and data in peer-reviewed journals
Strongly recommended for data by:
Not all Creative Commons licences are 'open'
}
NC -- You “may not use this work for
commercial purposes”.
Work under this licence cannot be used for
any purpose, therefore it is not open.
Can have significant, often unexpected
negative impact on potential re-use.
ND -- “No Derivative Works”.
Work under this licence cannot be
adapted if it is re-used. Not very
helpful for research!
NC & ND – An extremely restrictive re-
use licence, neither commercial
purposes nor adaptations are allowed.
KEY PAPER: Hagedorn et al (2011) ZooKeys
Creative commons licenses and the non-commercial condition: Implications for the
re-use of biodiversity information
Non-open licencing causes real
problems for research & education
The Creative Commons non-commercial (-NC) restriction is poorly defined in
most jurisdictions, and even more poorly understood by many of its users.
“non-commercial” != “non-profit”
A) Non-commercial actually excludes many teaching purposes:
In the UK, university students typically pay expensive tuition fees to attend.
Thus university teaching is often a commercial activity, -NC restricted
materials cannot be used to teach students in these circumstances.
B) Licence incompatibility – NC licences are not compatible with licences
used on major collaboration platforms like Wikipedia or Wikimedia Commons
C) Non-commercial organizations (e.g. Deutschlandradio)
have been successfully sued for re-using CC BY-NC
content without permission.
Klimpel (2012) Consequences, risks and side-effects of
the license module “non-commercial use only - NC”
Real problems of non-open data:
GBIF & biodiversity data
Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences
would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
Open data in scholarship, and beyond
The open data movement is much broader than just academia/research
It's been successful & popular in areas like open government data:
For transparency, detecting & discouraging corruption
For releasing social & commercial value (governments collect a lot of data
already, why not make wider use of it, at little or no extra cost?)
For participatory governance –
citizens can be more informed, a “read/write” society
Some text adapted from http://opengovernmentdata.org/
Each of these has clear parallels with open
research data: transparency & fraud detection,
extra value through research data re-use,
participatory citizen science
Open data in scholarship, and beyond
Similarly, and with some overlap to open research data,
there's the open GLAM movement
(GLAM = Galleries, Libraries, Archives & Museums)
In this case, their data is typically collections metadata
but also digital images of their collections
See http://openglam.org/ for more
Technical aspects of open data
So, you understand the imporance of licensing...
What next?
How best can we make our data openly available?
Where should I upload to?
What format(s) should I make the data available in?
Data Standards & Data File Formats
Adhere to existing standards, if possible!
xkcd 927 on standards
Data Standards & Data File Formats
Take note of community standards:
e.g. the Bermuda Principles for sharing DNA seq. data
● Automatic release of sequence
assemblies larger than 1 kb
(preferably within 24 hours).
● Immediate publication of finished
annotated sequences.
● Aim to make the entire sequence
freely available in the public domain
Data Standards & Data File Formats
If there are no formally agreed community standards,
canvas the community to create/formalise a standard
e.g. Best Practices for Data Sharing in Phylogenetic Research
Cranston et al (2014) PLOS Currents Tree of Life
e.g. The 1st Open Economics International Workshop
(Cambridge, 2013) bringing together academic
economists from around the world to discuss data
sharing in economics research.
Data Standards & Data File Formats
If there are multiple, competing file formats:
Opt for file formats based on open standards
https://en.wikipedia.org/wiki/Open_standard
e.g.
Avoid proprietary formats
https://en.wikipedia.org/wiki/Proprietary_format
e.g.
Data Standards & Data File Formats
A real example: recent creation of a new data
standard for exchange of 3-dimensional reconstruction
of objects from tomographic imaging data
SPIERS software
+ VAXML data standard
Sutton et al (2012) SPIERS and VAXML: A software
toolkit for tomographic visualisation and a format for
virtual specimen interchange.
Palaeontologia Electronica
A super brief, eclectic
history of scientific data
sharing
Centralised Data Centres
for specific data types
The Cambridge Crystallographic Data Centre, est. 1965
It maintains the Cambridge Structural Database **
** Not open data sensu stricto …some types of users/uses are charged
Data Sharing (by snail mail)
e.g. “The full profile listings are on floppy disks
which are available upon request”
Fernholz et al (1989) A survey of measurements and measuring
techniques in rapidly distorted compressible turbulent boundary layers.
Bilofsky & Burks (1988)
Nucleic Acids Research v16 n5
“The author will provide the
accession number to the
PROCEEDINGS [PNAS]
office to be included in a
footnote to the published
paper.”
1989
Supplementary Data (Online)
[ journal-hosted ]
Chen et al (1999)
Fluorescence Polarization in
Homogeneous Nucleic Acid
Analysis. Genome Research
“Numerical values for the
data are available as online
supplementary material at
http://www.genome.org.”
http://treebase.org/, est. 1994
Not all databases succeed.
Build it, and they may not come...
Of phylogenetic analyses published in 2010,
only ~4% of them have data available for re-use
Stoltzfus et al 2012. Sharing and re-use of phylogenetic trees
(and associated data) to facilitate synthesis. BMC Research Notes
“Each custodian of data on plant traits will retain the right to be informed of
any TRY activity that may involve his/her data, and will have the opportunity to
negotiate whether his/her data can be used, and whether general
guidelines of authorship need to be modified in that particular case
Custodians retain the rights to withdraw their data at any time.”
Not all databases provide open data
https://www.try-db.org/TryWeb/Submission.php
http://danielfalster.com/blog/2013/08/23/making-a-case-for-a-fully-open-trait-database/
Recommended reading:
Supp. Data Needs to Die
From the 1990s to 2010s, online supplementary data was used as a way of
dumping data online in an ad hoc manner... It was available *shrugs*
Traditional, journal-hosted supplementary files bury data. Additional files are
bunged online with little or no additional metadata describing them.
Thus typically, SI isn't searchable. That's a huge problem
Data should be FAIR:
Findable, Accessible, Interoperable, Re-usable
It should be findable independent of the research article
https://www.force11.org/group/fairgroup
Supp. Data Often Neglected
Publisher-neglect (Wiley) meant this
paper was online for a week without
the crucial spreadsheet file the
entire article was describing !!!
Deeply embarrassing.
N.B.
This has
happened at
many other
journals too.
Where to upload FAIR open data?
Genbank,
SRA,
1000's more!
http://www.crystallography.net/
'Data paper' journals
http://www.mdpi.com/journal/data/about
Intelligent data papers allow databases
to automatically pull-in your data
Many publishers (e.g. Pensoft) intelligently
markup data papers so that the data can be
automatically ingested into appropriate db's
on the day of publication!
Data
data
Data sharing benefits authors & re-users
Piwowar HA, Vision TJ. (2013)
Data reuse and the open data
citation advantage. PeerJ
1:e175
“...open data citation
benefit for this sample
to be 9%”
relative to papers
providing no public
data, for gene
expression microarray
data
10.7717/peerj.175/fig-2
See also previous work by
Piwowar:
10.1371/journal.pone.0000308
Those who share data, do better science
Wicherts, J. M., Bakker, M. & Molenaar, D. (2011)
Willingness to share research data is related to the
strength of the evidence and the quality of reporting of
statistical results. PLoS ONE 6, e26828+ URL
http://dx.doi.org/10.1371/journal.pone.0026828
The authors examined psychological papers for the quality of statistical
reporting & asked the authors of those papers for the full data underlying
the reported results. Generally, those who shared, had more statistically
robust, reproducible results.
“Email the author for data” - doesnt work
Wicherts JM, Borsboom D,
Kats J, Molenaar D (2006)
The poor availability of
psychological research
data for reanalysis.
American Psychologist 61:
726–728 link
A well-known problem, which
I myself have also faced
many times!!!
Many legacy journals
unfortunately still pretend
that “email the author” is
still acceptable.
Best practice open data is time consuming
(but still worth the extra effort!)
Emilio M. Bruna recently provided an estimate of the amount of
time it took him to prepare & upload open data related to
publication to figshare & dryad.
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-
hours-690/
11
Hours
& $90
(for Dryad)
Providing open-source code was the most time consuming part (25.5 hours),
and Open Access publication the most expensive ($600).
Not all data should be open!
Intelligent openness
is required
– Royal Society report
However, with informed consent,
if patients really want to, they should be
allowed to publish their own medical data
Obviously, there are some types of data which
should NOT be made
mandatorily open e.g.
sensitive medical data
Other exceptions to the open default
Sensitive species conservation data
e.g. exact geocoordinates of home range
Certain species of wild orchids, cacti & carnivorous plants
are highly endangered by illegal harvesting.
Publishing the exact geolocation data of the remaining
populations of commercially-desirable, endangered
species is really dumb thing to do.
Such data is typically held privately in databases (not
publicly available).
The 5 stars of open data
Most research data would get
ZERO (not available online)
Or just ONE star
http://5stardata.info/
3-star open research data is achievable
This is where research data publication
should be aiming for in the short term.
Publishing .csv / non-proprietary open data is
NOT actually that hard!
http://5stardata.info/
Further Reading
1.Editor’s Introduction - Samuel A. Moore
2.Open Content Mining - Peter Murray-Rust,
Jennifer C. Molloy, Diane Cabell
3.The Need to Humanize Open Science - Eric C.
Kansa
4.Data Sharing in a Humanitarian Organization: The
Experience of Médecins Sans Frontières - Unni
Karunakara
5.Why Open Drug Discovery Needs Four Simple
Rules for Licensing Data and Models - Antony J.
Williams, John Wilbanks, Sean Ekins
6.Open Data in the Earth and Climate Sciences -
Sarah Callaghan
7.Open Minded Psychology - Wouter van den Bos,
Mirjam Jenny, Dirk Wulff
8.Open Data in Health Sciences - Tom Pollard
9.Open Research Data in Economics - Velichka
Dimitrova
10.Open Data and Palaeontology - Ross Mounce
Open Access Book, CC BY
Published by Ubiquity Press
Confirmed speakers include: Michael Eisen & Patrick Brown
( 2 out of 3 of the co-founders of PLOS )
opencon2015.org @open_con #opencon2015
Last year:
Washington DC
Day 3 with advocacy at
NIH and US Senate
This year:
Brussels
Day 3 with advocacy at
European Commission
Further Reading
●
The Open Data Handbook - http://opendatahandbook.org/
●
5 star Open Data - http://5stardata.info/
●
Science as an open enterprise (2012) A Royal Society report
●
Caetano, D. S. & Aisenberg, A. 2014 Forgotten treasures: the fate of data in animal
behaviour studies Animal Behaviour
Data sharing in phylogenetics
●
Magee et al 2014 The Dawn of Open Access to Phylogenetic Data PLOS ONE
●
Drew et al 2013 Lost Branches on the Tree of Life. PLOS Biology
●
Stoltzfus et al 2012 Sharing and re-use of phylogenetic trees (and associated data)
to facilitate synthesis. BMC Research Notes
On licencing & legal issues with re-use
●
Hagedorn et al 2011 Creative commons licenses and the non-commercial
condition: Implications for the re-use of biodiversity information. ZooKeys
●
Mounce 2012. Life as a palaeontologist: Academia, the Internet and Creative
Commons. Palaeontology Online
●
Klimpel, P. 2012 Consequences, Risks, and side-effects of the license module
Non-Commercial – NC [PDF]
Further Reading
●
Murray-Rust, P. Open data in science. Serials Review 34, 52-64 (2008). URL
http://dx.doi.org/10.1016/j.serrev.2008.01.001
●
Leonelli, S., Smirnoff, N., Moore, J., Cook, C. & Bastow, R. Making open data work
for plant scientists. Journal of Experimental Botany 64, 4109-4117 (2013). URL
http://dx.doi.org/10.1093/jxb/ert273
●
Hrynaszkiewicz, I. & Cockerill, M. Open by default: a proposed copyright license
and waiver agreement for open access research and data in peer-reviewed
journals. BMC Research Notes 5, 494+ (2012). URL
http://dx.doi.org/10.1186/1756-0500-5-494
●
Boulton, G., Rawlins, M., Vallance, P. & Walport, M. Science as a public enterprise:
the case for open data. The Lancet 377, 1633-1635 (2011). URL
http://dx.doi.org/10.1016/s0140-6736(11)60647-8
●
Parr, C. S. Open sourcing ecological data. BioScience 57, 309-310 (2007). URL
http://dx.doi.org/10.1641/b570402
●
Poisot, T., Mounce, R. & Gravel, D. Moving toward a sustainable ecological
science: don't let data go to waste! Ideas in Ecology and Evolution 6 (2013). URL
http://dx.doi.org/10.4033/iee.2013.6b.14.f

Weitere ähnliche Inhalte

Was ist angesagt?

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetSharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetRoss Mounce
 
Workshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data miningWorkshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data miningRoss Mounce
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016TheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career ResearchersRoss Mounce
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 

Was ist angesagt? (20)

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetSharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
 
Workshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data miningWorkshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data mining
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career Researchers
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 

Andere mochten auch

How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? Nancy Pontika
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014Ross Mounce
 
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesAlex Holcombe
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingKent Anderson
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeRon Martinez
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You OnJill Cirasella
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationhierohiero
 

Andere mochten auch (10)

Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why?
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014
 
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundaries
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challenge
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You On
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly information
 

Ähnlich wie Open Research Data: Licensing | Standards | Future

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupAntoine Isaac
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeologyguest756e05
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showDerek Keats
 

Ähnlich wie Open Research Data: Licensing | Standards | Future (20)

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
Imac 090924
Imac 090924Imac 090924
Imac 090924
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to show
 

Mehr von Ross Mounce

Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Ross Mounce
 
Social Media For Researchers
Social Media For ResearchersSocial Media For Researchers
Social Media For ResearchersRoss Mounce
 
Social Media for Science
Social Media for ScienceSocial Media for Science
Social Media for ScienceRoss Mounce
 
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Ross Mounce
 

Mehr von Ross Mounce (7)

Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
 
Social Media For Researchers
Social Media For ResearchersSocial Media For Researchers
Social Media For Researchers
 
Social Media for Science
Social Media for ScienceSocial Media for Science
Social Media for Science
 
Herding Cats
Herding CatsHerding Cats
Herding Cats
 
Content Mining
Content MiningContent Mining
Content Mining
 
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
 
ProgPal2011
ProgPal2011ProgPal2011
ProgPal2011
 

Kürzlich hochgeladen

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 

Kürzlich hochgeladen (20)

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 

Open Research Data: Licensing | Standards | Future

  • 1. Open Research Data: Licensing | Standards | Future Ross Mounce (@RMounce) Natural History Museum, London British Ecological Society Open Data & Reproducibility Workshop, London, 2015-04-21 XKCD 1179 on ISO 8601
  • 2. bit.ly/opendataintro These slides are a re-spin of my longer OpenCon 2014 deck, on Slideshare here: All my textual content is licensed under the Creative Commons Attribution License 4.0 (CC BY), unless otherwise indicated
  • 3. Outline ● What is open data? ● A short history of data sharing ● Supplementary data needs to die ● FAIR data as a 1st class research output
  • 4.
  • 5. By sharing data we can see further Data (& code) are the building blocks of science Shared, re-used data allow us to more rigorously test hypotheses; “to see further” ...and to do it all more quickly and easily.
  • 6. What exactly is open data? From http://opendefinition.org/, see http://opendefinition.org/od/ for more detail Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)
  • 7. Legally, what is open data? There are many open knowledge definition (OKD) conformant licences, including (but not limited to): See here for the comprehensive list: http://opendefinition.org/licenses/ CC0 waiver http://creativecommons.org/ publicdomain/zero/1.0/ CC BY (Attribution only) https://creativecommons.org/ licenses/by/4.0/ CC BY-SA (Attribution-ShareAlike) https://creativecommons.org/licenses/by -sa/4.0/
  • 8. CC0 should be the default for data CC0 is the default for data at: Hrynaszkiewicz & Cockerill (2012) BMC Research Notes Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals Strongly recommended for data by:
  • 9. Not all Creative Commons licences are 'open' } NC -- You “may not use this work for commercial purposes”. Work under this licence cannot be used for any purpose, therefore it is not open. Can have significant, often unexpected negative impact on potential re-use. ND -- “No Derivative Works”. Work under this licence cannot be adapted if it is re-used. Not very helpful for research! NC & ND – An extremely restrictive re- use licence, neither commercial purposes nor adaptations are allowed. KEY PAPER: Hagedorn et al (2011) ZooKeys Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information
  • 10. Non-open licencing causes real problems for research & education The Creative Commons non-commercial (-NC) restriction is poorly defined in most jurisdictions, and even more poorly understood by many of its users. “non-commercial” != “non-profit” A) Non-commercial actually excludes many teaching purposes: In the UK, university students typically pay expensive tuition fees to attend. Thus university teaching is often a commercial activity, -NC restricted materials cannot be used to teach students in these circumstances. B) Licence incompatibility – NC licences are not compatible with licences used on major collaboration platforms like Wikipedia or Wikimedia Commons C) Non-commercial organizations (e.g. Deutschlandradio) have been successfully sued for re-using CC BY-NC content without permission. Klimpel (2012) Consequences, risks and side-effects of the license module “non-commercial use only - NC”
  • 11. Real problems of non-open data: GBIF & biodiversity data Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
  • 12. Open data in scholarship, and beyond The open data movement is much broader than just academia/research It's been successful & popular in areas like open government data: For transparency, detecting & discouraging corruption For releasing social & commercial value (governments collect a lot of data already, why not make wider use of it, at little or no extra cost?) For participatory governance – citizens can be more informed, a “read/write” society Some text adapted from http://opengovernmentdata.org/ Each of these has clear parallels with open research data: transparency & fraud detection, extra value through research data re-use, participatory citizen science
  • 13. Open data in scholarship, and beyond Similarly, and with some overlap to open research data, there's the open GLAM movement (GLAM = Galleries, Libraries, Archives & Museums) In this case, their data is typically collections metadata but also digital images of their collections See http://openglam.org/ for more
  • 14. Technical aspects of open data So, you understand the imporance of licensing... What next? How best can we make our data openly available? Where should I upload to? What format(s) should I make the data available in?
  • 15. Data Standards & Data File Formats Adhere to existing standards, if possible! xkcd 927 on standards
  • 16. Data Standards & Data File Formats Take note of community standards: e.g. the Bermuda Principles for sharing DNA seq. data ● Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). ● Immediate publication of finished annotated sequences. ● Aim to make the entire sequence freely available in the public domain
  • 17. Data Standards & Data File Formats If there are no formally agreed community standards, canvas the community to create/formalise a standard e.g. Best Practices for Data Sharing in Phylogenetic Research Cranston et al (2014) PLOS Currents Tree of Life e.g. The 1st Open Economics International Workshop (Cambridge, 2013) bringing together academic economists from around the world to discuss data sharing in economics research.
  • 18. Data Standards & Data File Formats If there are multiple, competing file formats: Opt for file formats based on open standards https://en.wikipedia.org/wiki/Open_standard e.g. Avoid proprietary formats https://en.wikipedia.org/wiki/Proprietary_format e.g.
  • 19.
  • 20. Data Standards & Data File Formats A real example: recent creation of a new data standard for exchange of 3-dimensional reconstruction of objects from tomographic imaging data SPIERS software + VAXML data standard Sutton et al (2012) SPIERS and VAXML: A software toolkit for tomographic visualisation and a format for virtual specimen interchange. Palaeontologia Electronica
  • 21. A super brief, eclectic history of scientific data sharing
  • 22. Centralised Data Centres for specific data types The Cambridge Crystallographic Data Centre, est. 1965 It maintains the Cambridge Structural Database ** ** Not open data sensu stricto …some types of users/uses are charged
  • 23. Data Sharing (by snail mail) e.g. “The full profile listings are on floppy disks which are available upon request” Fernholz et al (1989) A survey of measurements and measuring techniques in rapidly distorted compressible turbulent boundary layers.
  • 24. Bilofsky & Burks (1988) Nucleic Acids Research v16 n5 “The author will provide the accession number to the PROCEEDINGS [PNAS] office to be included in a footnote to the published paper.” 1989
  • 25. Supplementary Data (Online) [ journal-hosted ] Chen et al (1999) Fluorescence Polarization in Homogeneous Nucleic Acid Analysis. Genome Research “Numerical values for the data are available as online supplementary material at http://www.genome.org.”
  • 26. http://treebase.org/, est. 1994 Not all databases succeed. Build it, and they may not come... Of phylogenetic analyses published in 2010, only ~4% of them have data available for re-use Stoltzfus et al 2012. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Research Notes
  • 27. “Each custodian of data on plant traits will retain the right to be informed of any TRY activity that may involve his/her data, and will have the opportunity to negotiate whether his/her data can be used, and whether general guidelines of authorship need to be modified in that particular case Custodians retain the rights to withdraw their data at any time.” Not all databases provide open data https://www.try-db.org/TryWeb/Submission.php http://danielfalster.com/blog/2013/08/23/making-a-case-for-a-fully-open-trait-database/ Recommended reading:
  • 28. Supp. Data Needs to Die From the 1990s to 2010s, online supplementary data was used as a way of dumping data online in an ad hoc manner... It was available *shrugs* Traditional, journal-hosted supplementary files bury data. Additional files are bunged online with little or no additional metadata describing them. Thus typically, SI isn't searchable. That's a huge problem Data should be FAIR: Findable, Accessible, Interoperable, Re-usable It should be findable independent of the research article https://www.force11.org/group/fairgroup
  • 29. Supp. Data Often Neglected Publisher-neglect (Wiley) meant this paper was online for a week without the crucial spreadsheet file the entire article was describing !!! Deeply embarrassing. N.B. This has happened at many other journals too.
  • 30. Where to upload FAIR open data? Genbank, SRA, 1000's more! http://www.crystallography.net/
  • 32. Intelligent data papers allow databases to automatically pull-in your data Many publishers (e.g. Pensoft) intelligently markup data papers so that the data can be automatically ingested into appropriate db's on the day of publication! Data data
  • 33. Data sharing benefits authors & re-users Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 “...open data citation benefit for this sample to be 9%” relative to papers providing no public data, for gene expression microarray data 10.7717/peerj.175/fig-2 See also previous work by Piwowar: 10.1371/journal.pone.0000308
  • 34. Those who share data, do better science Wicherts, J. M., Bakker, M. & Molenaar, D. (2011) Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE 6, e26828+ URL http://dx.doi.org/10.1371/journal.pone.0026828 The authors examined psychological papers for the quality of statistical reporting & asked the authors of those papers for the full data underlying the reported results. Generally, those who shared, had more statistically robust, reproducible results.
  • 35. “Email the author for data” - doesnt work Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) The poor availability of psychological research data for reanalysis. American Psychologist 61: 726–728 link A well-known problem, which I myself have also faced many times!!! Many legacy journals unfortunately still pretend that “email the author” is still acceptable.
  • 36. Best practice open data is time consuming (but still worth the extra effort!) Emilio M. Bruna recently provided an estimate of the amount of time it took him to prepare & upload open data related to publication to figshare & dryad. http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35- hours-690/ 11 Hours & $90 (for Dryad) Providing open-source code was the most time consuming part (25.5 hours), and Open Access publication the most expensive ($600).
  • 37. Not all data should be open! Intelligent openness is required – Royal Society report However, with informed consent, if patients really want to, they should be allowed to publish their own medical data Obviously, there are some types of data which should NOT be made mandatorily open e.g. sensitive medical data
  • 38. Other exceptions to the open default Sensitive species conservation data e.g. exact geocoordinates of home range Certain species of wild orchids, cacti & carnivorous plants are highly endangered by illegal harvesting. Publishing the exact geolocation data of the remaining populations of commercially-desirable, endangered species is really dumb thing to do. Such data is typically held privately in databases (not publicly available).
  • 39. The 5 stars of open data Most research data would get ZERO (not available online) Or just ONE star http://5stardata.info/
  • 40. 3-star open research data is achievable This is where research data publication should be aiming for in the short term. Publishing .csv / non-proprietary open data is NOT actually that hard! http://5stardata.info/
  • 41. Further Reading 1.Editor’s Introduction - Samuel A. Moore 2.Open Content Mining - Peter Murray-Rust, Jennifer C. Molloy, Diane Cabell 3.The Need to Humanize Open Science - Eric C. Kansa 4.Data Sharing in a Humanitarian Organization: The Experience of Médecins Sans Frontières - Unni Karunakara 5.Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models - Antony J. Williams, John Wilbanks, Sean Ekins 6.Open Data in the Earth and Climate Sciences - Sarah Callaghan 7.Open Minded Psychology - Wouter van den Bos, Mirjam Jenny, Dirk Wulff 8.Open Data in Health Sciences - Tom Pollard 9.Open Research Data in Economics - Velichka Dimitrova 10.Open Data and Palaeontology - Ross Mounce Open Access Book, CC BY Published by Ubiquity Press
  • 42. Confirmed speakers include: Michael Eisen & Patrick Brown ( 2 out of 3 of the co-founders of PLOS ) opencon2015.org @open_con #opencon2015 Last year: Washington DC Day 3 with advocacy at NIH and US Senate This year: Brussels Day 3 with advocacy at European Commission
  • 43. Further Reading ● The Open Data Handbook - http://opendatahandbook.org/ ● 5 star Open Data - http://5stardata.info/ ● Science as an open enterprise (2012) A Royal Society report ● Caetano, D. S. & Aisenberg, A. 2014 Forgotten treasures: the fate of data in animal behaviour studies Animal Behaviour Data sharing in phylogenetics ● Magee et al 2014 The Dawn of Open Access to Phylogenetic Data PLOS ONE ● Drew et al 2013 Lost Branches on the Tree of Life. PLOS Biology ● Stoltzfus et al 2012 Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Research Notes On licencing & legal issues with re-use ● Hagedorn et al 2011 Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information. ZooKeys ● Mounce 2012. Life as a palaeontologist: Academia, the Internet and Creative Commons. Palaeontology Online ● Klimpel, P. 2012 Consequences, Risks, and side-effects of the license module Non-Commercial – NC [PDF]
  • 44. Further Reading ● Murray-Rust, P. Open data in science. Serials Review 34, 52-64 (2008). URL http://dx.doi.org/10.1016/j.serrev.2008.01.001 ● Leonelli, S., Smirnoff, N., Moore, J., Cook, C. & Bastow, R. Making open data work for plant scientists. Journal of Experimental Botany 64, 4109-4117 (2013). URL http://dx.doi.org/10.1093/jxb/ert273 ● Hrynaszkiewicz, I. & Cockerill, M. Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 5, 494+ (2012). URL http://dx.doi.org/10.1186/1756-0500-5-494 ● Boulton, G., Rawlins, M., Vallance, P. & Walport, M. Science as a public enterprise: the case for open data. The Lancet 377, 1633-1635 (2011). URL http://dx.doi.org/10.1016/s0140-6736(11)60647-8 ● Parr, C. S. Open sourcing ecological data. BioScience 57, 309-310 (2007). URL http://dx.doi.org/10.1641/b570402 ● Poisot, T., Mounce, R. & Gravel, D. Moving toward a sustainable ecological science: don't let data go to waste! Ideas in Ecology and Evolution 6 (2013). URL http://dx.doi.org/10.4033/iee.2013.6b.14.f