SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Making it Easier, Possibly Even Pleasant,
to Author Rich Experimental Metadata
Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
Presented at the BD2K California Centers Meeting, Palm Springs, Oct 9, 2015
High Quality Metadata are
Essential
for Large-Scale Reuse
and Biomedical Discovery
What is this colored picture about?
Many Minimum Information Standards
68 168
@micheldumontier::Bio-Ontologies 2015 7
http://www.w3.org/TR/hcls-dataset/
FAIR – Findable, Accessible, Interoperable, Reuseable
–> Data Sharing Plans
To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier
F2. data are described with rich metadata
F3. (meta)data are registered or indexed in a searchable resource
F4. metadata specify the data identifier
To be Accessible:
A1 (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2 metadata are eternally accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Re-usable:
R1. meta(data) have a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with their provenance
R1.3. (meta)data meet domain-relevant community standards
The Good News: Minimal
information checklists,
such as MIAME, are being
advanced from all sectors
of the biomedical
community
The Bad News:
Investigators view
requests for even
“minimal” information as
burdensome
Unguided Data Entry is Problematic
Research Questions
• What are the most effective approaches to maximize
reuse of pre-defined metadata elements?
• How can we get metadata authors to generate more
metadata in shorter time periods
– To what extent can we successfully predict metadata
values from existing metadata?
– Can we use manuscripts, published work, user history,
social media to improve prediction?
• To what extent can experts and crowds find, verify, and
fix metadata?
• To what degree does ontology-based metadata
improve the precision and recall of dataset search?
CEDAR technology will give us
• Tools
– To author metadata templates that define a community
standard while reusing previously defined elements
– To facilitate the capture of experimental metadata that
conform to one or more community standards
– To discover datasets that meet a plurality of dataset and
experimental requirements
• Methods
– To learn metadata patterns from unstructured, semi-structured,
and structured metadata
– To efficiently guide predictive entry of new metadata
– To index, query, and reason about ontology-based metadata
– To explore, verify, and collaboratively augment experimental
metadata even when the data are located elsewhere
CEDAR will empower users to meet and
exceed the minimal metadata standards
• Emphasis traditionally has been on development of
simple checklists of metadata elements
• Little practical consideration for
– Using shared value sets (search, browse, query)
– Common knowledge representations (interoperable
ecosystems of tools and data)
– Metadata validation against community standards
(richness and quality)
• We need a more expressive—and computable—
framework for describing metadata
CEDAR Team
Stanford University
• Mark A. Musen (PI)
• Michel Dumontier (Co-I)
• Olivier Gevaert (Co-I)
• Purvesh Khatri (Co-I)
• John Graybeal
• Martin O’Connor
• Marcos Martinez Romero
• Mary Panahiazar
• Attila Egyedi
• Ravi Shankar
Stanford Library
• Kim Durante
Oxford University
• Susanna-Assunta Sansone (Co-I)
• Phillippe Rocca-Serra
• Alejandra Gonzalez-Beltran
Yale University
• Steven H. Kleinstein (Co-I)
• Kei Cheung (Co-I)
Northrop Grumman (HIPC/Immport)
• Jeff Wiser (Co-I)
We also want to help users compose
templates and auto-magically suggest
metadata values
Can we use free text to predict
structured and semi-structured values?
Predicting structured metadata (accuracy)
Using multi label tree
Accuracy of 72% to predict 39 most-occurring keys
Average precision 79% (for all keys)
Average recall 82% (for all keys)
Predicting semi-structured metadata
Metadata: The Next Frontier
We welcome new partners to
– Learn about your metadata authoring workflow
– Evaluate CEDAR technology
– Incorporate this technology into your curation
workflows
2
3
metadatacenter.org
http://metadatacenter.org

Weitere ähnliche Inhalte

Was ist angesagt?

Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 

Was ist angesagt? (20)

Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
MLA CE Course: Third-Party PubMed Tools
MLA CE Course: Third-Party PubMed ToolsMLA CE Course: Third-Party PubMed Tools
MLA CE Course: Third-Party PubMed Tools
 
Third-Party PubMed Tools
Third-Party PubMed ToolsThird-Party PubMed Tools
Third-Party PubMed Tools
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

Andere mochten auch

Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessMichel Dumontier
 
New Directions In Russian Private Equity
New Directions In Russian Private EquityNew Directions In Russian Private Equity
New Directions In Russian Private EquityThomas Nastas
 
1st Network-of-BioThings Hackathon
1st Network-of-BioThings Hackathon1st Network-of-BioThings Hackathon
1st Network-of-BioThings HackathonMichel Dumontier
 
Social Media and Fundraisng - are you prepared?
Social Media and Fundraisng - are you prepared? Social Media and Fundraisng - are you prepared?
Social Media and Fundraisng - are you prepared? Noesium Consulting
 
Blerta Interpreter 1
Blerta Interpreter 1Blerta Interpreter 1
Blerta Interpreter 1Blerta
 
BigScrum - Scaling Teams to Programs
BigScrum - Scaling Teams to ProgramsBigScrum - Scaling Teams to Programs
BigScrum - Scaling Teams to ProgramsThinkLouder
 
Projetos de Salas Residenciais
Projetos de Salas ResidenciaisProjetos de Salas Residenciais
Projetos de Salas Residenciaismarthahuback
 
Uz big design talk may10
Uz big design talk may10Uz big design talk may10
Uz big design talk may10UserZoom
 
IVI Kirov 27 June 2007 Presentation For Distribution
IVI Kirov 27 June 2007 Presentation For DistributionIVI Kirov 27 June 2007 Presentation For Distribution
IVI Kirov 27 June 2007 Presentation For DistributionThomas Nastas
 
Social marketing that engages & empowers
Social marketing that engages & empowersSocial marketing that engages & empowers
Social marketing that engages & empowersJason Dunstone
 
How to write tech posts & talks
How to write tech posts & talksHow to write tech posts & talks
How to write tech posts & talksRyan Chartrand
 
Guerrilla Readers - výroční prezentace
Guerrilla Readers - výroční prezentaceGuerrilla Readers - výroční prezentace
Guerrilla Readers - výroční prezentaceGuerrilla Readers
 
Balangero asbestos mine dumps restoration a few years after, in the aftermath...
Balangero asbestos mine dumps restoration a few years after, in the aftermath...Balangero asbestos mine dumps restoration a few years after, in the aftermath...
Balangero asbestos mine dumps restoration a few years after, in the aftermath...Oboni Riskope Associates Inc.
 
Reputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalReputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalDamjana Kocjanc
 

Andere mochten auch (20)

Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
New Directions In Russian Private Equity
New Directions In Russian Private EquityNew Directions In Russian Private Equity
New Directions In Russian Private Equity
 
1st Network-of-BioThings Hackathon
1st Network-of-BioThings Hackathon1st Network-of-BioThings Hackathon
1st Network-of-BioThings Hackathon
 
Social Media and Fundraisng - are you prepared?
Social Media and Fundraisng - are you prepared? Social Media and Fundraisng - are you prepared?
Social Media and Fundraisng - are you prepared?
 
Blerta Interpreter 1
Blerta Interpreter 1Blerta Interpreter 1
Blerta Interpreter 1
 
BigScrum - Scaling Teams to Programs
BigScrum - Scaling Teams to ProgramsBigScrum - Scaling Teams to Programs
BigScrum - Scaling Teams to Programs
 
Projetos de Salas Residenciais
Projetos de Salas ResidenciaisProjetos de Salas Residenciais
Projetos de Salas Residenciais
 
Uz big design talk may10
Uz big design talk may10Uz big design talk may10
Uz big design talk may10
 
IVI Kirov 27 June 2007 Presentation For Distribution
IVI Kirov 27 June 2007 Presentation For DistributionIVI Kirov 27 June 2007 Presentation For Distribution
IVI Kirov 27 June 2007 Presentation For Distribution
 
Social marketing that engages & empowers
Social marketing that engages & empowersSocial marketing that engages & empowers
Social marketing that engages & empowers
 
How to write tech posts & talks
How to write tech posts & talksHow to write tech posts & talks
How to write tech posts & talks
 
Terrificselfmotivating
TerrificselfmotivatingTerrificselfmotivating
Terrificselfmotivating
 
Drenajes en cirugia biliopancreatica
Drenajes en cirugia biliopancreaticaDrenajes en cirugia biliopancreatica
Drenajes en cirugia biliopancreatica
 
Guerrilla Readers - výroční prezentace
Guerrilla Readers - výroční prezentaceGuerrilla Readers - výroční prezentace
Guerrilla Readers - výroční prezentace
 
Squizz presentation
Squizz presentationSquizz presentation
Squizz presentation
 
Report z Vyškov
Report z VyškovReport z Vyškov
Report z Vyškov
 
Balangero asbestos mine dumps restoration a few years after, in the aftermath...
Balangero asbestos mine dumps restoration a few years after, in the aftermath...Balangero asbestos mine dumps restoration a few years after, in the aftermath...
Balangero asbestos mine dumps restoration a few years after, in the aftermath...
 
D4 I Intro Web
D4 I Intro WebD4 I Intro Web
D4 I Intro Web
 
Reputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalReputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, final
 
Rebirthofan Eagle
Rebirthofan EagleRebirthofan Eagle
Rebirthofan Eagle
 

Ähnlich wie Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata

David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access DataSciSIG
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest GroupYaffa Rubinstien
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortData Science NIH
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
 
Towards data FAIRness
Towards data FAIRnessTowards data FAIRness
Towards data FAIRnessCARARE
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
VODAN Africa IN.pptx
VODAN Africa IN.pptxVODAN Africa IN.pptx
VODAN Africa IN.pptxGetu Tadele
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystemMaryann Martone
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHJohn Koch
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata managementPistoia Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 

Ähnlich wie Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata (20)

David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevort
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Towards data FAIRness
Towards data FAIRnessTowards data FAIRness
Towards data FAIRness
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
VODAN Africa IN.pptx
VODAN Africa IN.pptxVODAN Africa IN.pptx
VODAN Africa IN.pptx
 
FAIR overview - MAQC Society, Feb 2018
FAIR overview - MAQC Society, Feb 2018FAIR overview - MAQC Society, Feb 2018
FAIR overview - MAQC Society, Feb 2018
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCH
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 

Mehr von Michel Dumontier

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerMichel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureMichel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsMichel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMichel Dumontier
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
 

Mehr von Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Ontologies
OntologiesOntologies
Ontologies
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 

Kürzlich hochgeladen

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Kürzlich hochgeladen (20)

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata

  • 1. Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata Michel Dumontier Associate Professor of Medicine (Biomedical Informatics) Stanford University Presented at the BD2K California Centers Meeting, Palm Springs, Oct 9, 2015
  • 2. High Quality Metadata are Essential for Large-Scale Reuse and Biomedical Discovery
  • 3. What is this colored picture about?
  • 4.
  • 8. FAIR – Findable, Accessible, Interoperable, Reuseable –> Data Sharing Plans To be Findable: F1. (meta)data are assigned a globally unique and eternally persistent identifier F2. data are described with rich metadata F3. (meta)data are registered or indexed in a searchable resource F4. metadata specify the data identifier To be Accessible: A1 (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2 metadata are eternally accessible, even when the data are no longer available To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data To be Re-usable: R1. meta(data) have a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with their provenance R1.3. (meta)data meet domain-relevant community standards
  • 9. The Good News: Minimal information checklists, such as MIAME, are being advanced from all sectors of the biomedical community The Bad News: Investigators view requests for even “minimal” information as burdensome
  • 10. Unguided Data Entry is Problematic
  • 11. Research Questions • What are the most effective approaches to maximize reuse of pre-defined metadata elements? • How can we get metadata authors to generate more metadata in shorter time periods – To what extent can we successfully predict metadata values from existing metadata? – Can we use manuscripts, published work, user history, social media to improve prediction? • To what extent can experts and crowds find, verify, and fix metadata? • To what degree does ontology-based metadata improve the precision and recall of dataset search?
  • 12. CEDAR technology will give us • Tools – To author metadata templates that define a community standard while reusing previously defined elements – To facilitate the capture of experimental metadata that conform to one or more community standards – To discover datasets that meet a plurality of dataset and experimental requirements • Methods – To learn metadata patterns from unstructured, semi-structured, and structured metadata – To efficiently guide predictive entry of new metadata – To index, query, and reason about ontology-based metadata – To explore, verify, and collaboratively augment experimental metadata even when the data are located elsewhere
  • 13. CEDAR will empower users to meet and exceed the minimal metadata standards • Emphasis traditionally has been on development of simple checklists of metadata elements • Little practical consideration for – Using shared value sets (search, browse, query) – Common knowledge representations (interoperable ecosystems of tools and data) – Metadata validation against community standards (richness and quality) • We need a more expressive—and computable— framework for describing metadata
  • 14. CEDAR Team Stanford University • Mark A. Musen (PI) • Michel Dumontier (Co-I) • Olivier Gevaert (Co-I) • Purvesh Khatri (Co-I) • John Graybeal • Martin O’Connor • Marcos Martinez Romero • Mary Panahiazar • Attila Egyedi • Ravi Shankar Stanford Library • Kim Durante Oxford University • Susanna-Assunta Sansone (Co-I) • Phillippe Rocca-Serra • Alejandra Gonzalez-Beltran Yale University • Steven H. Kleinstein (Co-I) • Kei Cheung (Co-I) Northrop Grumman (HIPC/Immport) • Jeff Wiser (Co-I)
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. We also want to help users compose templates and auto-magically suggest metadata values
  • 20. Can we use free text to predict structured and semi-structured values?
  • 21. Predicting structured metadata (accuracy) Using multi label tree Accuracy of 72% to predict 39 most-occurring keys Average precision 79% (for all keys) Average recall 82% (for all keys) Predicting semi-structured metadata
  • 22. Metadata: The Next Frontier We welcome new partners to – Learn about your metadata authoring workflow – Evaluate CEDAR technology – Incorporate this technology into your curation workflows

Hinweis der Redaktion

  1. Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
  2. Omics is big data
  3. Metadata Templates Manager: Let’s create a new template element first
  4. The “Study Type” Template Element is satisfied by a Pick from a List of Controlled Terms.
  5. Now we can instantiate the template with actual data. But filling in these templates is a mess. Can’t the “study type” be inferred from the “description”? Does knowing who the “investigator” is suggest other elements of the study design?
  6. CEDAR Architecture. Note the connection to NCBO services, and the use of the NCBO ontology repository for CEDAR resources.