SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Why should researchers care 
about data curation? 
Varsha Khodiyar
WHY SHARE DATA
Expenditure on data 
generation 
 16.8% NIH grant applications funded* 
◦ Hours spent writing grants? 
◦ Hours spent reviewing grants? 
 Resources are finite/expensive 
◦ Modified animals 
◦ Specialized reagents 
 Time and effort to generate good, valid 
data 
* For fiscal year 2013 
(http://report.nih.gov/success_rates/Success_ByIC.cfm)
Reproducibility is a cornerstone 
of science 
“[W]e evaluated the replication of data 
analyses in 18 articles on microarray-based 
gene expression profiling 
published in Nature Genetics in 2005– 
2006...We reproduced two analyses in 
principle and six partially or with some 
discrepancies; ten could not be 
reproduced. The main reason for 
failure to reproduce was data 
unavailability.” 
Ioannidis JPA. et al. Repeatability of published 
microarray gene expression analyses. Nature 
Genetics 41, 149–55 (2009)
HOW TO SHARE DATA
Data needs to be… 
 Discoverable 
◦ Need to know it’s there 
 Accessible 
◦ Must be able to get to the data 
 Usable 
◦ Require sufficient information about how the data was 
generated 
 Persistent 
◦ Historical data access as part of the scientific record, as 
well as for new research 
 Reliable 
◦ Data provenance informs data reuse decisions
Traditional publishing 
• Data in a PDF is discoverable and accessible, by 
readers of the paper 
• But is not usable - can't manipulate data in a PDF table
I’ll send my data when someone 
asks for it 
 “We examined the availability 
of data from 516 studies 
between 2 and 22 years old 
 The odds of a data set 
being reported as extant fell by 17% per year 
 Broken e-mails and obsolete storage devices 
were the main obstacles to data sharing” 
Vines TH. et al. The availability of research data declines 
rapidly with article age. Curr Biol 24, 94–7 (2014)
I’ll make my data available in a 
repository 
• Data is discoverable, accessible and persistent 
• But data may not be usable, as limited space for data-specific 
description in an unstructured repository
I’ll write a data paper 
Materials and Methods 
Animal surgery 
Behavioural testing 
Data collection and cell-type 
classification 
Data description 
Data file organization 
Metadata organization 
• Data is discoverable, accessible and persistent 
• Sufficient space for methodological detail
BUT ARE WE MISSING 
SOMETHING?
Human vs. machine 
• Is your data truly 
discoverable by researchers 
outside your own domain? 
• Too many papers to read in 
each person’s own field. 
• Could increasing the 
machine readability of your 
data result in increased use 
of your data? 
• Is making an entire 
dataset machine readable, 
feasible?
Metadata 
 Fully describe the experiments that 
generated the data 
◦ Takes time to ensure full metadata capture 
 Structure the metadata to ensure 
machine readability 
◦ Structure needs to be decided 
prospectively 
 Metadata can be discovered in 
automated way 
◦ Requires relevant infrastructure
Curation is a specialised task 
 Researchers are not data 
management professionals 
 Learning how to curate data, takes 
time 
 Article publication is carried out by 
specialists (journals). 
 Follows that data publication should 
also be carried out by specialists.
Benefits of curated metadata 
 Users of data 
◦ Data is findable 
◦ Data provenance is clear 
◦ Increased data usability 
◦ Reduce unnecessary duplication of data 
 Data generators 
◦ Data more likely to be used, so data 
citation rates will increase 
◦ Contribute to novel research that data 
generators would not have carried out
Metadata as an integral part of a 
data paper
FUTURE POSSIBILITIES
Machine readable research 
metadata could lead to... 
Linked Data 
Infrastructure for 
linked research data 
is being developed 
a way to publish data so that data from 
different sources can be connected and 
queried 
"Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch 
and Richard Cyganiak. http://lod-cloud.net/"
The beginnings of linked 
research data 
An open-access database of publicly 
available antibodies against human protein 
targets, with user and provider data on 
antibody efficacy in a range of assays. 
“We show that Antibodypedia may be used to 
track the development of available and validated 
antibodies to the individual chromosomes, and 
thus the database is an attractive tool to identify 
proteins with no or few antibodies yet 
generated.”
Summary 
 Reusing previously generated data is 
economical 
 Data reuse dependant on discoverable, 
accessible and usable shared datasets 
 Descriptive metadata enhances 
(re)usability of data 
 Capture of structured metadata is a 
specialist skill 
 The future: machine readable metadata 
will be important
Thanks for listening...

Weitere ähnliche Inhalte

Was ist angesagt?

Metadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskMetadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskNico Carver
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Biocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification InitiativeBiocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification Initiativemhaendel
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Working Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and OpportunitiesWorking Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and OpportunitiesCTSI at UCSF
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSMaaike Duine
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementC. Tobin Magle
 
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
 
Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015Paul Courtney
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data PublishingMaaike Duine
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 
Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...Phoenix Bioinformatics
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
 

Was ist angesagt? (20)

Metadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskMetadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at Risk
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
 
Biocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification InitiativeBiocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification Initiative
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Working Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and OpportunitiesWorking Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and Opportunities
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
 
Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 

Andere mochten auch

Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright
 
Kurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere MortalsKurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere MortalsBertram Ludäscher
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 

Andere mochten auch (6)

data curation issues
data curation issuesdata curation issues
data curation issues
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
 
Kurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere MortalsKurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere Mortals
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 

Ähnlich wie Why should researchers care about data curation?

Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014Varsha Khodiyar
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)Dag Endresen
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Katina Toufexis
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014Right to Research
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research DataRoss Mounce
 

Ähnlich wie Why should researchers care about data curation? (20)

Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 

Mehr von Varsha Khodiyar

Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Varsha Khodiyar
 
COVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and testsCOVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and testsVarsha Khodiyar
 
COVID-19 variants and vaccines
COVID-19 variants and vaccinesCOVID-19 variants and vaccines
COVID-19 variants and vaccinesVarsha Khodiyar
 
Data citation and sharing during article publication
Data citation and sharing during article publicationData citation and sharing during article publication
Data citation and sharing during article publicationVarsha Khodiyar
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositoriesVarsha Khodiyar
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?Varsha Khodiyar
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Varsha Khodiyar
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...Varsha Khodiyar
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing processVarsha Khodiyar
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
 
Practical challenges for researchers in data sharing
Practical challenges for researchers in data sharingPractical challenges for researchers in data sharing
Practical challenges for researchers in data sharingVarsha Khodiyar
 
Update from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IGUpdate from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IGVarsha Khodiyar
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshopVarsha Khodiyar
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalVarsha Khodiyar
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesVarsha Khodiyar
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterVarsha Khodiyar
 
Clinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific DataClinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific DataVarsha Khodiyar
 

Mehr von Varsha Khodiyar (20)

Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
 
COVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and testsCOVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and tests
 
COVID-19 variants and vaccines
COVID-19 variants and vaccinesCOVID-19 variants and vaccines
COVID-19 variants and vaccines
 
Data citation and sharing during article publication
Data citation and sharing during article publicationData citation and sharing during article publication
Data citation and sharing during article publication
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositories
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing process
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
 
Practical challenges for researchers in data sharing
Practical challenges for researchers in data sharingPractical challenges for researchers in data sharing
Practical challenges for researchers in data sharing
 
Update from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IGUpdate from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IG
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshop
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journal
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Clinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific DataClinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific Data
 

Kürzlich hochgeladen

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 

Kürzlich hochgeladen (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 

Why should researchers care about data curation?

  • 1. Why should researchers care about data curation? Varsha Khodiyar
  • 3. Expenditure on data generation  16.8% NIH grant applications funded* ◦ Hours spent writing grants? ◦ Hours spent reviewing grants?  Resources are finite/expensive ◦ Modified animals ◦ Specialized reagents  Time and effort to generate good, valid data * For fiscal year 2013 (http://report.nih.gov/success_rates/Success_ByIC.cfm)
  • 4. Reproducibility is a cornerstone of science “[W]e evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005– 2006...We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability.” Ioannidis JPA. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149–55 (2009)
  • 6. Data needs to be…  Discoverable ◦ Need to know it’s there  Accessible ◦ Must be able to get to the data  Usable ◦ Require sufficient information about how the data was generated  Persistent ◦ Historical data access as part of the scientific record, as well as for new research  Reliable ◦ Data provenance informs data reuse decisions
  • 7. Traditional publishing • Data in a PDF is discoverable and accessible, by readers of the paper • But is not usable - can't manipulate data in a PDF table
  • 8. I’ll send my data when someone asks for it  “We examined the availability of data from 516 studies between 2 and 22 years old  The odds of a data set being reported as extant fell by 17% per year  Broken e-mails and obsolete storage devices were the main obstacles to data sharing” Vines TH. et al. The availability of research data declines rapidly with article age. Curr Biol 24, 94–7 (2014)
  • 9. I’ll make my data available in a repository • Data is discoverable, accessible and persistent • But data may not be usable, as limited space for data-specific description in an unstructured repository
  • 10. I’ll write a data paper Materials and Methods Animal surgery Behavioural testing Data collection and cell-type classification Data description Data file organization Metadata organization • Data is discoverable, accessible and persistent • Sufficient space for methodological detail
  • 11. BUT ARE WE MISSING SOMETHING?
  • 12. Human vs. machine • Is your data truly discoverable by researchers outside your own domain? • Too many papers to read in each person’s own field. • Could increasing the machine readability of your data result in increased use of your data? • Is making an entire dataset machine readable, feasible?
  • 13. Metadata  Fully describe the experiments that generated the data ◦ Takes time to ensure full metadata capture  Structure the metadata to ensure machine readability ◦ Structure needs to be decided prospectively  Metadata can be discovered in automated way ◦ Requires relevant infrastructure
  • 14. Curation is a specialised task  Researchers are not data management professionals  Learning how to curate data, takes time  Article publication is carried out by specialists (journals).  Follows that data publication should also be carried out by specialists.
  • 15. Benefits of curated metadata  Users of data ◦ Data is findable ◦ Data provenance is clear ◦ Increased data usability ◦ Reduce unnecessary duplication of data  Data generators ◦ Data more likely to be used, so data citation rates will increase ◦ Contribute to novel research that data generators would not have carried out
  • 16. Metadata as an integral part of a data paper
  • 18. Machine readable research metadata could lead to... Linked Data Infrastructure for linked research data is being developed a way to publish data so that data from different sources can be connected and queried "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  • 19. The beginnings of linked research data An open-access database of publicly available antibodies against human protein targets, with user and provider data on antibody efficacy in a range of assays. “We show that Antibodypedia may be used to track the development of available and validated antibodies to the individual chromosomes, and thus the database is an attractive tool to identify proteins with no or few antibodies yet generated.”
  • 20. Summary  Reusing previously generated data is economical  Data reuse dependant on discoverable, accessible and usable shared datasets  Descriptive metadata enhances (re)usability of data  Capture of structured metadata is a specialist skill  The future: machine readable metadata will be important