SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Leonore Reiser and Lisa Harper
UC Berkeley
February 14, 2018
Good Data Stewardship
• Publish data with the paper
• Describe data to your fullest ability
• Use the right words to identify Data
• Deposit data in the right Data Repository
• Budget time for Data Management
• Don’t think of it as YOUR data
What’s in it for YOU?
We all benefit from data sharing.
More citations of YOUR work, increasing
your visibility in the research community.
Easily comply with journal and
funding requirements
Less time spent fulfilling requests for data.
Publications are increasing exponentially
http://bar.utoronto.ca/50YearsOfArabidopsis/
Idea
Funding
Experiments
Analysis
Publication
Reuse
Data
Lifecycle
Idea
Funding
Experiments
Analysis
Publication
Reuse
Data
Lifecycle
Don’t THROW it away!
Recycle!
Data re-use leads to new insights
Data Processing
Quality Control
Validation
503 datasets 314 datasets
Statistical Analysis
Additional Experiments
Yu Zhang et al. PNAS doi:10.1073/pnas.1716300115
NOVEL DISCOVERY
MET1 and CMT3 are independently required for the maintenance of
asymmetric CHH methylation at CMT2 target sites
Credit: Melissa Haendel
Wilkinson, et al., (2016) The FAIR Guiding Principles for scientific data management and stewardship
10.1038/sdata.2016.18. https://www.nature.com/articles/sdata201618
• Findable means data is human and machine readable
and attached to persistent identifiers
• Accessible means data can be found and retrieved by
humans and machines using standard formats
• Interoperable means data can be exchanged and used
between systems.
• Reusable means data can be used by others
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
CHROM POS REF ALT Line
1
Line
2
1 12345 A C A A
3 67891 C T H C
10 23456 G T T U
CHROM POS REF ALT Line
1
Line
2
Gm01 12345 A C 0/0 0/0
Gm03 67891 C T 0/1 0/0
Gm10 23456 G T 1/1 ./.
CHROM POS REF ALT Line
1
Line
2
Chr01 12345 A C AA AA
Chr03 67891 C T C/T CC
Chr10 23456 G T TT NN
ALL MEAN THE SAME!
BUT ARE NOT THE SAME
Use Standard formats: SNP example
SNP (Single Nucleotide Polymorphism): A base, a chromosome
number and genome position, and a reference to the genome
assembly used, and the genotypes of lines tested.
VCF: Variant Call Format
Is the STANDARD
Use the File format
STANDARD
for your data type
DOI:/10.3389/fpls.2017.01812
Use Standard formats: Data in
images is NOT accessible
Data in PDF (image) format
is not findable or
accessible.
Leave tabular data in tables
If you use EXCEL, look out for data corruption and hidden
Microsoft characters that impede parsing
Zeimann, 2016
10.1186/s13059-016-1044-7
Use Standard formats: Beware of Excel
Fig. 1: Prevalence of gene name errors in Supplementary Excel files
Percentage of papers with gene lists effected Increase in supplementary files with gene
name errors per year
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Metadata: Species = xxx
Germplasm = xxx
Field location = xxx
Environment = xxx
Measurement = xxx
method
Phenotype (Data): Plant is 170cm tall
Metadata is data about the data,
and allows understanding of the data
Supply Complete Metadata
• Write your Materials and Methods as if you wanted
someone else to be able to reproduce your work.
• Be accurate and complete about your bench and field
work; include samples/stocks/lines used, accession
numbers, sources of materials, exact measuring
techniques etc.
• Be AS accurate and complete about your computational
pipelines. Include your created raw data files and
versions. If you use reference data (eg; sequence
assembly), include the version number, download dates,
and download source.
• Include names of software applications, versions,
platforms and source. If you use a CyVerse, use their
metadata reporting tools.
Supply Complete Metadata
Supply Complete Metadata
Pretty Good!
Supply Complete Metadata
Pretty Good!
Supply Complete Metadata
Not so good
Supply Complete Metadata
597 Possible Attributes
At least 50 Attributes
Genome Sequence Assembly At least 100 Attributes
Budget TIME
to provide Metadata
The metadata in public databases is often confusing; a test case
with Zea mays mRNA seq data reveals a high proportion of
missing, misleading or incomplete metadata. 2018.
https://doi.org/10.1016/j.plantsci.2017.10.014
Supply Complete Metadata
• Established: Genomic Standards Consortium
(http://gensc.org)
• Minimal Information about Any Sequence
• Emerging
• Minimal Information about a Plant Phenotyping Experiment
(MIAPPE)
Metadata Standards for Various Data Types
Supply Complete Metadata
Ask For Help from Database People
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Cell
Same word,
different meanings
Different words,
same concept
Eggplant
Aubergine
Melongene
Embrace Ontologies
An Ontology is:
A set of precisely defined terms
In a logical hierarchy, and the
Relationship between can be
understood by computers
PO:0020105
ligule
Ontologies: Hierarchy of terms and
explicit relationship among terms
Plant
Ontology
(PO)
Ligule
PO:0020105
Vascular leaf
PO:0009025
Leaf sheath
PO:0020104
Flag leaf
PO:0020103
Adult vascular leaf
PO:0020103
Leaf
PO:0025034
Data from diverse types of experiments and organisms
can be compared
Henk J. Franssen, et al (2015)
doi: 10.1242/dev.120774
(Medicago)
Li,S. et al., (2016)
10.1016/j.devcel.2016.10.012
Arabidopsis
Zhou, X-F, et a.l. (2014) 10.1104/pp.114.243808
Embracing ontologies
• Ontologies provide a POWERFUL, MACHINE READABLE utility
for data
• Find and use existing ontologies
(http://www.obofoundry.org/, Planteome)
• Gene Function = Gene Ontology (GO)
• Sequences = Sequence Ontology (SO)
• Plant Anatomy and Development = Plant Ontology (PO)
• Phenotypes = Phenotype and Trait Ontology (PATO)
• …..many many others
• Apply them consistently
• To datasets (e.g. in metadata)
• In publications (e.g. TAIR GO/PO submission)
• Ask Questions!
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Use persistent, unambiguous
identifiers
Example: Gene names
GOOD!
Identifiers also resolve confusion over
species
Is this Arabidopsis? Maize? Tomato?
DOI:10/24/pp.17.00021
One gene- many names
GOOD
OK
(history)
One name- many genes
Solution: Community Standards and
Nomenclature Resources
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Problem: Data is not findable because
it is not available
Piwowar HA, Vision TJ.(2013)Data reuse and the open data
citation
advantage.PeerJ1:e175https://doi.org/10.7717/peerj.175
Gibney and VanNorden
doi:10.1038/nature.2013.14416
Put your data in a stable public
repository
Large International Repositories for many data
types for all species. ALL sequence data goes here
Large but specialized databases serving many species
Soybase
Specialized databases serving specific communities
Submitting to a repository: SNP example
As of 9/2017, All NON- human SNPs are
processed through EMBL in the European
Variation Archive (EVA,
https://www.ebi.ac.uk/eva/).
NCBI’s dbSNP will only process Human SNPs
EVA will require:
• Data in (standard) Variant Calling Format
(VCF) including allele frequencies
• SUBMITTED Genome or Transcriptome
assembly
What if there is no specialized database?
Or no recommendations from journals ?
You should get a Digital Object Identifier (DOI)
http://datadryad.org
** Curated, metadata
https://zenodo.org/
https://figshare.com/
https://datashare.ucsf.edu/stash
And just for you folks at UC……
But.. please, don’t forget to actually complete
your submission*...
*And you never have to spend time fielding requests
or transferring huge data files again
https://xkcd.com/1909/
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Cite, share freely and encourage others to be FAIR
Include searchable and citable identifiers for your data in
your papers
Release your data with clearly defined terms of use
e.g. Creative Commons (CC) CC-0, CC-BY
If you do not specify restrictions are implied limiting reuse
Cite all of your data sources
Enhances reproducibility….. and also shows value to funders!
When reviewing papers check them for FAIRness
Good data practices benefit everyone
(and help you get funded)
A few simple things to remember when
preparing your paper
• Include unambiguous identifiers
• Format data according to defined standards
• Keep data in (parseable) tables or text
• Include meaningful metadata
• Deposit data in a long term stable public repository and get a
DOI
• It is never to early to think about (meta) data, the best time to
start is BEFORE you are writing
You can get help structuring,
organizing and managing your data
● Contact your Community Database
● Don’t have one? Contact a curator
(Leonore, Lisa… we live amongst you)
● UCB Research Data Management Librarians
(http://researchdata.berkeley.edu/)
Thank you!
AgBioData
What YOU can do right now to
support FAIR data
Ask your funders for increased access to FAIR data
When you review papers- looks at the data, and be
sure it is well described (Metadata is great)
Change your attitude a little: You data will be more
cited, more important if you make it FAIR
Deposit your Data and get a DOI
Ask your institution to value good data submission,
and good data recycling

Weitere ähnliche Inhalte

Was ist angesagt?

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataTomasz Adamusiak
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final versionSanjeet Mann
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Alison Hitchens
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Michel Dumontier
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies Simon Jupp
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for BiopharmaTom Plasterer
 

Was ist angesagt? (20)

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked Data
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final version
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Reuse of Repository Data
Reuse of Repository DataReuse of Repository Data
Reuse of Repository Data
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 

Ähnlich wie How to make your published data findable, accessible, interoperable and reusable

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014Susanna-Assunta Sansone
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...SC CTSI at USC and CHLA
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesLIBER Europe
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflowVarsha Khodiyar
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 

Ähnlich wie How to make your published data findable, accessible, interoperable and reusable (20)

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 

Mehr von Phoenix Bioinformatics

PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentationPhoenix Bioinformatics
 
2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- HualaPhoenix Bioinformatics
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...Phoenix Bioinformatics
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...Phoenix Bioinformatics
 

Mehr von Phoenix Bioinformatics (13)

PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
 
Reiser aspb2019 asgiven
Reiser aspb2019 asgivenReiser aspb2019 asgiven
Reiser aspb2019 asgiven
 
TAIR ASPB 2018 Presentation
TAIR ASPB 2018 PresentationTAIR ASPB 2018 Presentation
TAIR ASPB 2018 Presentation
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation
 
2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini
 
2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala
 
TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
 
TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 

Kürzlich hochgeladen (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

How to make your published data findable, accessible, interoperable and reusable

  • 1. Leonore Reiser and Lisa Harper UC Berkeley February 14, 2018
  • 2. Good Data Stewardship • Publish data with the paper • Describe data to your fullest ability • Use the right words to identify Data • Deposit data in the right Data Repository • Budget time for Data Management • Don’t think of it as YOUR data
  • 3. What’s in it for YOU? We all benefit from data sharing. More citations of YOUR work, increasing your visibility in the research community. Easily comply with journal and funding requirements Less time spent fulfilling requests for data.
  • 4. Publications are increasing exponentially http://bar.utoronto.ca/50YearsOfArabidopsis/
  • 7. Data re-use leads to new insights Data Processing Quality Control Validation 503 datasets 314 datasets Statistical Analysis Additional Experiments Yu Zhang et al. PNAS doi:10.1073/pnas.1716300115 NOVEL DISCOVERY MET1 and CMT3 are independently required for the maintenance of asymmetric CHH methylation at CMT2 target sites
  • 8. Credit: Melissa Haendel Wilkinson, et al., (2016) The FAIR Guiding Principles for scientific data management and stewardship 10.1038/sdata.2016.18. https://www.nature.com/articles/sdata201618 • Findable means data is human and machine readable and attached to persistent identifiers • Accessible means data can be found and retrieved by humans and machines using standard formats • Interoperable means data can be exchanged and used between systems. • Reusable means data can be used by others
  • 9. How to Make Your Published Data FAIR • Use standard formats • Supply complete metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 10. CHROM POS REF ALT Line 1 Line 2 1 12345 A C A A 3 67891 C T H C 10 23456 G T T U CHROM POS REF ALT Line 1 Line 2 Gm01 12345 A C 0/0 0/0 Gm03 67891 C T 0/1 0/0 Gm10 23456 G T 1/1 ./. CHROM POS REF ALT Line 1 Line 2 Chr01 12345 A C AA AA Chr03 67891 C T C/T CC Chr10 23456 G T TT NN ALL MEAN THE SAME! BUT ARE NOT THE SAME Use Standard formats: SNP example SNP (Single Nucleotide Polymorphism): A base, a chromosome number and genome position, and a reference to the genome assembly used, and the genotypes of lines tested. VCF: Variant Call Format Is the STANDARD Use the File format STANDARD for your data type
  • 11. DOI:/10.3389/fpls.2017.01812 Use Standard formats: Data in images is NOT accessible Data in PDF (image) format is not findable or accessible. Leave tabular data in tables
  • 12. If you use EXCEL, look out for data corruption and hidden Microsoft characters that impede parsing Zeimann, 2016 10.1186/s13059-016-1044-7 Use Standard formats: Beware of Excel Fig. 1: Prevalence of gene name errors in Supplementary Excel files Percentage of papers with gene lists effected Increase in supplementary files with gene name errors per year
  • 13. How to Make Your Published Data FAIR • Use standard formats • Supply complete metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 14. Metadata: Species = xxx Germplasm = xxx Field location = xxx Environment = xxx Measurement = xxx method Phenotype (Data): Plant is 170cm tall Metadata is data about the data, and allows understanding of the data Supply Complete Metadata
  • 15. • Write your Materials and Methods as if you wanted someone else to be able to reproduce your work. • Be accurate and complete about your bench and field work; include samples/stocks/lines used, accession numbers, sources of materials, exact measuring techniques etc. • Be AS accurate and complete about your computational pipelines. Include your created raw data files and versions. If you use reference data (eg; sequence assembly), include the version number, download dates, and download source. • Include names of software applications, versions, platforms and source. If you use a CyVerse, use their metadata reporting tools. Supply Complete Metadata
  • 19. Supply Complete Metadata 597 Possible Attributes At least 50 Attributes Genome Sequence Assembly At least 100 Attributes
  • 20. Budget TIME to provide Metadata The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. 2018. https://doi.org/10.1016/j.plantsci.2017.10.014
  • 22. • Established: Genomic Standards Consortium (http://gensc.org) • Minimal Information about Any Sequence • Emerging • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types Supply Complete Metadata Ask For Help from Database People
  • 23. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 24. Cell Same word, different meanings Different words, same concept Eggplant Aubergine Melongene
  • 25. Embrace Ontologies An Ontology is: A set of precisely defined terms In a logical hierarchy, and the Relationship between can be understood by computers
  • 26. PO:0020105 ligule Ontologies: Hierarchy of terms and explicit relationship among terms Plant Ontology (PO) Ligule PO:0020105 Vascular leaf PO:0009025 Leaf sheath PO:0020104 Flag leaf PO:0020103 Adult vascular leaf PO:0020103 Leaf PO:0025034
  • 27. Data from diverse types of experiments and organisms can be compared Henk J. Franssen, et al (2015) doi: 10.1242/dev.120774 (Medicago) Li,S. et al., (2016) 10.1016/j.devcel.2016.10.012 Arabidopsis Zhou, X-F, et a.l. (2014) 10.1104/pp.114.243808
  • 28. Embracing ontologies • Ontologies provide a POWERFUL, MACHINE READABLE utility for data • Find and use existing ontologies (http://www.obofoundry.org/, Planteome) • Gene Function = Gene Ontology (GO) • Sequences = Sequence Ontology (SO) • Plant Anatomy and Development = Plant Ontology (PO) • Phenotypes = Phenotype and Trait Ontology (PATO) • …..many many others • Apply them consistently • To datasets (e.g. in metadata) • In publications (e.g. TAIR GO/PO submission) • Ask Questions!
  • 29. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 31. Identifiers also resolve confusion over species Is this Arabidopsis? Maize? Tomato?
  • 32. DOI:10/24/pp.17.00021 One gene- many names GOOD OK (history)
  • 33. One name- many genes
  • 34. Solution: Community Standards and Nomenclature Resources
  • 35. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 36. Problem: Data is not findable because it is not available Piwowar HA, Vision TJ.(2013)Data reuse and the open data citation advantage.PeerJ1:e175https://doi.org/10.7717/peerj.175 Gibney and VanNorden doi:10.1038/nature.2013.14416
  • 37. Put your data in a stable public repository Large International Repositories for many data types for all species. ALL sequence data goes here Large but specialized databases serving many species Soybase Specialized databases serving specific communities
  • 38. Submitting to a repository: SNP example As of 9/2017, All NON- human SNPs are processed through EMBL in the European Variation Archive (EVA, https://www.ebi.ac.uk/eva/). NCBI’s dbSNP will only process Human SNPs EVA will require: • Data in (standard) Variant Calling Format (VCF) including allele frequencies • SUBMITTED Genome or Transcriptome assembly
  • 39. What if there is no specialized database? Or no recommendations from journals ? You should get a Digital Object Identifier (DOI) http://datadryad.org ** Curated, metadata https://zenodo.org/ https://figshare.com/ https://datashare.ucsf.edu/stash And just for you folks at UC……
  • 40. But.. please, don’t forget to actually complete your submission*... *And you never have to spend time fielding requests or transferring huge data files again
  • 42. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 43. Cite, share freely and encourage others to be FAIR Include searchable and citable identifiers for your data in your papers Release your data with clearly defined terms of use e.g. Creative Commons (CC) CC-0, CC-BY If you do not specify restrictions are implied limiting reuse Cite all of your data sources Enhances reproducibility….. and also shows value to funders! When reviewing papers check them for FAIRness
  • 44. Good data practices benefit everyone (and help you get funded)
  • 45. A few simple things to remember when preparing your paper • Include unambiguous identifiers • Format data according to defined standards • Keep data in (parseable) tables or text • Include meaningful metadata • Deposit data in a long term stable public repository and get a DOI • It is never to early to think about (meta) data, the best time to start is BEFORE you are writing
  • 46. You can get help structuring, organizing and managing your data ● Contact your Community Database ● Don’t have one? Contact a curator (Leonore, Lisa… we live amongst you) ● UCB Research Data Management Librarians (http://researchdata.berkeley.edu/)
  • 48. What YOU can do right now to support FAIR data Ask your funders for increased access to FAIR data When you review papers- looks at the data, and be sure it is well described (Metadata is great) Change your attitude a little: You data will be more cited, more important if you make it FAIR Deposit your Data and get a DOI Ask your institution to value good data submission, and good data recycling